Performance optimization case course: TPCH-Q18
select * from (
select
c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice,
sum(l_quantity)
from
customer, orders, lineitem
where
o_orderkey in (
select
l_orderkey
from
lineitem
group by
l_orderkey
having
sum(l_quantity) > 314
)
and c_custkey = o_custkey
and o_orderkey = l_orderkey
group by
c_name,
c_custkey,
o_orderkey,
o_orderdate,
o_totalprice
order by
o_totalprice desc,
o_orderdate
) where rownum<=100;
Let’s examine the SQL statement. If we name the following subquery
select
l_orderkey, sum(l_quantity) lq
from
lineitem
group by
l_orderkey
view lo, the body of the original query statement can be written as:
select * from (
select
c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, lq
from
customer, orders, lo
where
c_custkey = o_custkey
and o_orderkey = l_orderkey
and lq > 314
order by
o_totalprice desc,
o_orderdate
) where rownum<=100;
Now the computation on the view lo is a grouping & aggregation operation, and the result can be regarded as a table with l_orderkey as its primary key. Thus, the main query is equivalent to associating the fact tables orders and lo, then associating the dimension table customer, and finally performing conditional filtering and sorting.
1. Data storage
Store orders and lineitem in order by primary key, which can implementorderedmerging. Store customer in order byprimary key.
Continue to use orders.ctx, lineitem.ctx and customer.ctx from Q3. Copy them to the main directory of this query.
2. General method
Since both orders and lo are ordered by the primary key orderkey, they are equivalent to being homo dimension table to each other, allowing for order-based merge join.
The result set of taking the top 100 after ordered merging and aggregation is equivalent to a small fact table, which will be joined with customer. We need to use the big dimension table search method mentioned in previous articles.
Calculation code:
A |
|
1 |
=now() |
2 |
>quantity=314 |
3 |
=file("lineitem.ctx").open().cursor@m(L_ORDERKEY,L_QUANTITY) |
4 |
=A3.group@s(L_ORDERKEY;sum(L_QUANTITY):quantities).select(quantities>quantity) |
5 |
=file("orders.ctx").open().new(A4,O_ORDERKEY,O_CUSTKEY,O_TOTALPRICE,O_ORDERDATE,quantities) |
6 |
=A5.total(top(100;-O_TOTALPRICE,O_ORDERDATE)) |
7 |
=file("customer.ctx").open() |
8 |
=A6.joinx@q(O_CUSTKEY,A7:C_CUSTKEY,C_NAME) |
9 |
=interval@ms(A1,now()) |
A8 uses the joinx@q function explained in previous articles to deduplicate O_CUSTKEY in the result of A6 and perform a batch search in customer table.
Test result:
Test items |
Execution time (seconds) |
General method |
9 |
It should be noted that the dimension table customer will not appear in the ordered merge calculation of fact tables before A7; the join after A7 is a search operation on big dimension table. Throughout this process, the dimension table primary key sequence-numberization method cannot be used, so there is no need to improve performance through data conversion in this query.
3. Column-wise computing
Perform column-wise computing based on ordered merge of fact tables and big dimension table search.
Calculation code:
A |
|
1 |
=now() |
2 |
>quantity=314 |
3 |
=file("lineitem.ctx").open().cursor@mv(L_ORDERKEY,L_QUANTITY) |
4 |
=A3.group@s(L_ORDERKEY;sum(L_QUANTITY):quantities).select@v(quantities>quantity) |
5 |
=file("orders.ctx").open().new(A4,O_ORDERKEY,O_CUSTKEY,O_TOTALPRICE,O_ORDERDATE,quantities) |
6 |
=A5.total(top(100;-O_TOTALPRICE,O_ORDERDATE)) |
7 |
=file("customer.ctx").open() |
8 |
=A6.joinx@q(O_CUSTKEY,A7:C_CUSTKEY,C_NAME) |
9 |
=interval@ms(A1,now()) |
A3 gets the columnar cursor.
A4 uses the group function to do column-wise computing, and then select@v also performs column-wise computing. The result will return a columnar cursor.
Test result:
Test items |
Execution time (seconds) |
General method |
9 |
Column-wise computing |
2 |
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/2bkGwqTj
Youtube 👉 https://www.youtube.com/@esProc_SPL
Chinese version