"Ⅰ SQL code and analysis Below is the SQL query statement: select n_name, sum(l_extendedprice * ( .."

Jerry RaqForum 14 No.
320 View • 1 Years ago

Performance Optimization Exercises Using TPC-H – Q5

Ⅰ SQL code and analysis

Below is the SQL query statement:

select
    n_name,
    sum(l_extendedprice * (1 - l_discount)) as revenue
from
    customer,
    orders,
    lineitem,
    supplier,
    nation,
    region
where
    c_custkey = o_custkey
    and l_orderkey = o_orderkey
    and l_suppkey = s_suppkey
    and c_nationkey = s_nationkey
    and s_nationkey = n_nationkey
    and n_regionkey = r_regionkey
    and r_name = 'ASIA'
    and o_orderdate >= date '1995-01-01'
    and o_orderdate < date '1995-01-01' + interval '1' year
group by
    n_name
order by
    revenue desc;

This is a grouping & aggregation operation on the filtered result set of multi-table association that contains a primary-sub table association between orders table and lineitem table.

Ⅱ SPL solution

The working principle is similar to that in Q3.

	A
1	=now()
2	1995-1-1
3	=elapse@y(A2,1)
4	>name="ASIA"
5	=file("region.btx").import@b().select(R_NAME==name)
6	=file("nation.btx").import@b().switch@i(N_REGIONKEY,A5:R_REGIONKEY).keys(N_NATIONKEY)
7	=file("supplier.ctx").open().cursor@m(S_SUPPKEY,S_NATIONKEY;A6.find(S_NATIONKEY)).fetch().keys@im(S_SUPPKEY)
8	=file("customer.ctx").open().cursor@m(C_CUSTKEY,C_NATIONKEY;A6.find(C_NATIONKEY)).fetch().keys@im(C_CUSTKEY)
9	=file("orders.ctx").open().cursor@m(O_ORDERKEY,O_CUSTKEY;O_ORDERDATE>=A2 && O_ORDERDATE < A3,O_CUSTKEY:A8)
10	=file("lineitem.ctx").open().news(A9,L_SUPPKEY,L_EXTENDEDPRICE,L_DISCOUNT,O_CUSTKEY;L_SUPPKEY:A7)
11	=A10.select(O_CUSTKEY.C_NATIONKEY==L_SUPPKEY.S_NATIONKEY)
12	=A11.groups@u(L_SUPPKEY.S_NATIONKEY:N_NAME;sum(L_EXTENDEDPRICE*(1-L_DISCOUNT)):revenue)
13	=A12.sort@z(revenue).run(N_NAME=A6.find(N_NAME).N_NAME)
14	=interval@ms(A1,now())

Here we use the techniques of performing filtering at cursor creation, order-based primary-sub table association, and converting join field to foreign key table pointers.

Unlike Q3, the grouping field isn’t the ordered L_ORDERKEY and group@s() function becomes unsuitable. A12 performs grouping on S_NATIONKEY instead of N_NAME because integer comparison is faster than string comparison, and replace the field with N_NAME after the grouping operation is done.

Ⅲ Further optimization

1. Optimization method

In this example, we will use the date-integer conversion optimization method explained in Q1 – the orders table’s OR_ORDERDATE field is already converted in Q3 – and the dimension table primary key numberization – region table’s R_REGIONKEY field, nation table’s N_NATIONKEY and N_REGIONKEY fields, and supplier table’s S_SUPPKEY and S_NATIONKEY fields are already converted in Q2; and orders table’s O_CUSTKEY is already converted in Q3. Here we only need to convert customer table’s C_NATIONKEY field and lineitem table’s L_SUPPKEY field.

2. Code for data conversion

2.1 Conversion on region table, nation table and supplier table

Copy region_2.btx, nation_2.btx and supplier_2.ctx, and rename them region_5.btx, nation_5.btx and supplier_5.ctx.

2.2 Conversion on orders table

Copy orders_4.ctx and name it orders_5.ctx.

2.3 Conversion on customer table

	A
1	=file("customer_3.ctx").open().import()
2	=file("nation.btx").import@b()
3	=A1.run(C_NATIONKEY=A2.pselect(N_NATIONKEY==C_NATIONKEY))
4	=file("customer_5.ctx").create(#C_CUSTKEY, C_NAME, C_ADDRESS, C_NATIONKEY, C_PHONE, C_ACCTBAL, C_MKTSEGMENT, C_COMMENT)
	>A4.append(A3)

2.4 Conversion on lineitem table

	A
1	=file("supplier.ctx").open().import(S_SUPPKEY).keys@im(S_SUPPKEY)
2	=file("lineitem_4.ctx").open().cursor()
3	=A2.run(L_SUPPKEY=A1.pfind(L_SUPPKEY))
4	=file("lineitem_5.ctx").create@p(#L_ORDERKEY,#L_LINENUMBER,L_PARTKEY, L_SUPPKEY, L_QUANTITY, L_EXTENDEDPRICE,L_DISCOUNT, L_TAX, L_RETURNFLAG, L_LINESTATUS, L_SHIPDATE, L_COMMITDATE, L_RECEIPTDATE,L_SHIPINSTRUCT, L_SHIPMODE, L_COMMENT)
	>A4.append(A3)

3. Code after data conversion

The dimension tables and the value list of string fields need to be preloaded. Below is preloading code:

	A
1	>env(region, file(“region_5.btx”).import@b())
2	>env(nation, file("nation_5.btx").import@b())
3	>env(customer, file("customer_5.ctx").open().import())
4	>env(supplier, file("supplier_5.ctx").open().import())

Before performing the query, we need to first run the preloading code to load the small dimension tables into memory.

Computing code:

	A
1	=now()
2	1995-1-1
3	=days@o(elapse@y(A2,1))
4	=days@o(A2)
5	>name="ASIA"
6	=nation.(region(N_REGIONKEY).R_NAME==name)
7	=supplier.@m(if(A6(S_NATIONKEY), S_NATIONKEY,null))
8	=customer.@m(if(A6(C_NATIONKEY), C_NATIONKEY,null))
9	=file("orders_5.ctx").open().cursor@m(O_ORDERKEY,O_CUSTKEY;O_ORDERDATE>=A4 && O_ORDERDATE<A3,O_CUSTKEY:A8:#)
10	=file("lineitem_5.ctx").open().news(A9,L_SUPPKEY,L_EXTENDEDPRICE,L_DISCOUNT,O_CUSTKEY;L_SUPPKEY:A7:#)
11	=A10.select(O_CUSTKEY ==L_SUPPKEY)
12	=A11.groups@u(L_SUPPKEY:N_NAME;sum(L_EXTENDEDPRICE*(1-L_DISCOUNT)):revenue)
13	=A12.sort@z(revenue).run(N_NAME=nation(N_NAME).N_NAME)
14	=interval@ms(A1,now())

We use two optimization techniques in the code – the alignment sequence after dimension table primary key numberization and association with dimension table row numbers.

Ⅳ Using enterprise edition’s column-wise computation

1. Original data

	A
1	=now()
2	1995-1-1
3	=elapse@y(A2,1)
4	>name="ASIA"
5	=file("region.btx").import@b().select@1(R_NAME==name).R_REGIONKEY
6	=file("nation.btx").import@b().select(N_REGIONKEY==A5).derive@o().keys@i(N_NATIONKEY)
7	=file("supplier.ctx").open().cursor@mv(S_SUPPKEY,S_NATIONKEY;A6.find(S_NATIONKEY)).fetch().keys@im(S_SUPPKEY)
8	=file("customer.ctx").open().cursor@mv(C_CUSTKEY,C_NATIONKEY;A6.find(C_NATIONKEY)).fetch().keys@im(C_CUSTKEY)
9	=file("orders.ctx").open().cursor@mv(O_ORDERKEY,O_CUSTKEY;O_ORDERDATE>=A2 && O_ORDERDATE<A3,O_CUSTKEY:A8)
10	=file("lineitem.ctx").open().news(A9,L_SUPPKEY,L_EXTENDEDPRICE,L_DISCOUNT,O_CUSTKEY;L_SUPPKEY:A7)
11	=A10.select@v(O_CUSTKEY.C_NATIONKEY==L_SUPPKEY.S_NATIONKEY)
12	=A11.groups@u(L_SUPPKEY.S_NATIONKEY;sum(L_EXTENDEDPRICE*(1-L_DISCOUNT)):revenue)
13	=A12.sort@z(revenue).new(A6.find(S_NATIONKEY).N_NAME,revenue)
14	=interval@ms(A1,now())

A11 uses @v option in select() function that performs filtering on the column-wise cursor to keep the filtered cursor still as a column-wise cursor.

2. Optimized data

The dimension tables and the value list of string fields need to be preloaded. Below is preloading code:

	A
1	>env(region, file(“region_5.btx”).import@bv())
2	>env(nation, file("nation_5.btx").import@bv())
3	>env(customer, file("customer_5.ctx").open().import@v())
4	>env(supplier, file("supplier_5.ctx").open().import@v())

Before performing the query, we need to first run the preloading code to load the small dimension tables into memory.

Computing code:

	A
1	=now()
2	1995-1-1
3	=days@o(elapse@y(A2,1))
4	=days@o(A2)
5	>name="ASIA"
6	=nation.(region(N_REGIONKEY).R_NAME==name)
7	=supplier.(A6(S_NATIONKEY))
8	=customer.(A6(C_NATIONKEY))
9	=file("orders_5.ctx").open().cursor@mv(O_ORDERKEY,O_CUSTKEY;O_ORDERDATE>=A4 && O_ORDERDATE<A3 && A8(O_CUSTKEY))
10	=file("lineitem_5.ctx").open().news(A9,L_SUPPKEY,L_EXTENDEDPRICE,L_DISCOUNT,O_CUSTKEY;A7(L_SUPPKEY))
11	=A10.derive@o(supplier(L_SUPPKEY).S_NATIONKEY,customer(O_CUSTKEY).C_NATIONKEY).select@v(C_NATIONKEY==S_NATIONKEY)
12	=A11.groups@u(S_NATIONKEY;sum(L_EXTENDEDPRICE*(1-L_DISCOUNT)):revenue)
13	=A12.sort@z(revenue).new(nation(S_NATIONKEY).N_NAME,revenue)
14	=interval@ms(A1,now())

A9 uses A8(O_CUSTKEY) to only filter data at cursor creation without performing switch association. The reason is explained in Q2, and same operation for A10. A11 uses derive@o() to add two new columns and select@v to keep result as a column-wise cursor.

Ⅴ Test result

Unit: Second

	Regular	Column-wise
Before optimization	23.5	9.8
After optimization	11.5	5.7

SPL Official Website 👉 https://www.scudata.com

SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL

SPL Learning Material 👉 https://c.scudata.com

SPL Source Code and Package 👉 https://github.com/SPLWare/esProc

Discord 👉 https://discord.gg/cFTcUNs7

Youtube 👉 https://www.youtube.com/@esProc_SPL

esProc

Jerry • 320 View • 1 Years ago