Performance Optimization Exercises Using TPC-H – Q12
Ⅰ SQL code and analysis
Below is the SQL query statement:
select
l_shipmode,
sum(case
when o_orderpriority = '1-URGENT' or o_orderpriority = '2-HIGH' then 1
else 0 end) as high_line_count,
sum(case
when o_orderpriority <> '1-URGENT' and o_orderpriority <> '2-HIGH' then 1
else 0 end) as low_line_count
from
orders,
lineitem
where
o_orderkey = l_orderkey
and l_shipmode in ('TRUCK', 'MAIL')
and l_commitdate < l_receiptdate
and l_shipdate < l_commitdate
and l_receiptdate >= date '1995-01-01'
and l_receiptdate < date '1995-01-01' + interval '1' year
group by
l_shipmode
order by
l_shipmode;
This is a grouping & aggregation operation on the filtered result set of association between primary table orders and sub table lineitem.
Ⅱ SPL solution
The optimization principle for the primary-sub table association between orders and lineitem is similar to that in Q9.
In the above SQL code, conditions for high_line_count and low_line_count are opposite. One can be computed from the other using a simpler formula.
A |
|
1 |
=now() |
2 |
1995-1-1 |
3 |
=elapse@y(A2, 1) |
4 |
=["MAIL", "TRUCK"] |
5 |
=file("lineitem.ctx").open().cursor@m(L_ORDERKEY,L_SHIPMODE;L_RECEIPTDATE>=A2 && L_RECEIPTDATE<A3 && A4.contain(L_SHIPMODE) && L_COMMITDATE<L_RECEIPTDATE && L_SHIPDATE<L_COMMITDATE) |
6 |
=file("orders.ctx").open().new@r(A5,L_SHIPMODE,O_ORDERPRIORITY) |
7 |
=A6.run(O_ORDERPRIORITY=if(O_ORDERPRIORITY=="1-URGENT" || O_ORDERPRIORITY=="2-HIGH",1,0)) |
8 |
=A7.groups(L_SHIPMODE;sum(O_ORDERPRIORITY):high_line_count, sum(1-O_ORDERPRIORITY):low_line_count) |
9 |
=interval@ms(A1,now()) |
Ⅲ Further optimization
1. Optimization method
In this example, we will use the date-integer conversion method explained in Q1 – lineitem table’s L_SHIPDATE field, L_COMMITDATE field and L_RECEIPTDATE have been converted in the previous examples, and the string-integer conversion method explained in the same essay – orders table’s O_ORDERPRIORITY is already converted. Here we just need to convert lineitem table’s L_SHIPMODE field.
2. Code for data conversion
2.1 Conversion on orders table
Copy orders_10.ctx and rename it orders_12.ctx.
2.2 Conversion on lineitem table
A |
|
1 |
=file("lineitem.ctx").open().cursor(L_SHIPMODE) |
2 |
=A1.id(L_SHIPMODE).sort() |
3 |
=file("l_shipmode.txt").export(A2) |
4 |
=file("lineitem_10.ctx").open().cursor() |
5 |
=A4.run(L_SHIPMODE=A2.pos@b(L_SHIPMODE)) |
6 |
=file("lineitem_12.ctx").create@p(#L_ORDERKEY,#L_LINENUMBER,L_PARTKEY, L_SUPPKEY, L_QUANTITY, L_EXTENDEDPRICE,L_DISCOUNT, L_TAX, L_RETURNFLAG, L_LINESTATUS, L_SHIPDATE, L_COMMITDATE, L_RECEIPTDATE,L_SHIPINSTRUCT, L_SHIPMODE, L_COMMENT) |
7 |
>A6.append(A5) |
3. Code after data conversion
Computing code:
A |
|
1 |
>l_shipmode=file("l_shipmode.txt").import@si() |
2 |
>o_orderpriority=file("o_orderpriority.txt").import@si() |
3 |
=now() |
4 |
1995-1-1 |
5 |
=days@o(elapse@y(A4, 1)) |
6 |
=days@o(A4) |
7 |
=l_shipmode.(["MAIL", "TRUCK"].contain(~)) |
8 |
=o_orderpriority.(["1-URGENT","2-HIGH"].contain(~)) |
9 |
=file("lineitem_12.ctx").open().cursor@m(L_ORDERKEY,L_SHIPMODE;L_RECEIPTDATE>=A6 && L_RECEIPTDATE<A5 && A7(L_SHIPMODE) && L_COMMITDATE<L_RECEIPTDATE && L_SHIPDATE<L_COMMITDATE) |
10 |
=file("orders_12.ctx").open().new@r(A9,L_SHIPMODE,O_ORDERPRIORITY) |
11 |
=A10.run(O_ORDERPRIORITY=if(A8(O_ORDERPRIORITY),1,0)) |
12 |
=A11.groups(L_SHIPMODE;sum(O_ORDERPRIORITY):high_line_count, sum(1-O_ORDERPRIORITY):low_line_count) |
13 |
=A12.run(L_SHIPMODE=l_shipmode(L_SHIPMODE)) |
14 |
=interval@ms(A3,now()) |
A13 restores the integer type O_ORDERPRIORITY field to the original string type one.
Ⅳ Using enterprise edition’s column-wise computation
1. Original data
A |
|
1 |
=now() |
2 |
1995-1-1 |
3 |
=elapse@y(A2, 1) |
4 |
=["MAIL", "TRUCK"] |
5 |
=file("lineitem.ctx").open().cursor@mv(L_ORDERKEY,L_SHIPMODE;L_RECEIPTDATE>=A2 && L_RECEIPTDATE<A3 && A4.contain(L_SHIPMODE) && L_COMMITDATE<L_RECEIPTDATE && L_SHIPDATE<L_COMMITDATE) |
6 |
=file("orders.ctx").open().new@r(A5,L_SHIPMODE,O_ORDERPRIORITY) |
7 |
=A6.derive@o(if(O_ORDERPRIORITY=="1-URGENT" || O_ORDERPRIORITY=="2-HIGH",1,0):flag) |
8 |
=A7.groups(L_SHIPMODE;sum(flag):high_line_count, sum(1-flag):low_line_count) |
9 |
=interval@ms(A1,now()) |
2. Optimized data
A |
|
1 |
>l_shipmode=file("l_shipmode.txt").import@si() |
2 |
>o_orderpriority=file("o_orderpriority.txt").import@si() |
3 |
=now() |
4 |
1995-1-1 |
5 |
=days@o(elapse@y(A4, 1)) |
6 |
=days@o(A4) |
7 |
=l_shipmode.(["MAIL", "TRUCK"].contain(~)) |
8 |
=o_orderpriority.(["1-URGENT","2-HIGH"].contain(~)) |
9 |
=file("lineitem_12.ctx").open().cursor@mv(L_ORDERKEY,L_SHIPMODE;L_RECEIPTDATE>=A6 && L_RECEIPTDATE<A5 && A7(L_SHIPMODE) && L_COMMITDATE<L_RECEIPTDATE && L_SHIPDATE<L_COMMITDATE) |
10 |
=file("orders_12.ctx").open().new@r(A9,L_SHIPMODE,O_ORDERPRIORITY) |
11 |
=A10.derive@o(if(A8(O_ORDERPRIORITY),1,0):flag) |
12 |
=A11.groups(L_SHIPMODE;sum(flag):high_line_count,sum(1-flag):low_line_count) |
13 |
=A12.new(l_shipmode(L_SHIPMODE):L_SHIPMODE,high_line_count,low_line_count) |
14 |
=interval@ms(A3,now()) |
Ⅴ Test result
Unit: Second
Regular |
Column-wise |
|
Before optimization |
9.9 |
7.4 |
After optimization |
6.4 |
3.5 |
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL