Learn performance optimization skills from TPCH tests - Q20
I. Query Requirement
Q20 is to identify suppliers in a given nation that can provide a more competitive price of a particular part in a certain year. The so-called more competitive suppliers refer to those suppliers who have an excess of a given part available; an excess is defined to be more than 50% of the certain parts that the supplier shipped for a given nation in a given year.
Q20 is characterized as two tables join operations with sorting, aggregation, IN sub-query and ordinary sub-query.
II. Oracle Execution
The query SQL written in Oracle is as follows:
select /*+ parallel(n) */
s_name,s_address
from
supplier,nation
where
s_suppkey in (
select
ps_suppkey
from
partsupp
where
ps_partkey in (
select
p_partkey
from
part
where
p_name like 'bisque%'
)
and ps_availqty > (
select
0.5 * sum(l_quantity)
from
lineitem
where
l_partkey = ps_partkey
and l_suppkey = ps_suppkey
and l_shipdate >= date '1995-01-01'
and l_shipdate < date '1995-01-01' + interval '1' year
)
)
and s_nationkey = n_nationkey
and n_name = 'CHINA'
order by
s_name;
Where /*+ parallel(n) */ is the parallel query syntax of Oracle, and n is the parallel number.
Script execution time, Unit: seconds
Number of parallel |
1 |
2 |
4 |
8 |
12 |
Oracle |
437 |
307 |
242 |
201 |
175 |
III. SPL Optimization
This query looks complex with many layers nested, mainly because SQL does not advocate step-by-step operation. If we divide it into several steps, it will be much clearer:
1. Filter the NATION table according to conditions
2. Filter the PART table according to conditions
3. Match and filter the SUPPLIER table with the result of 1 as a foreign key table
4. Match and filter the PARTSUPP table with the results of 2 and 3 as a foreign key table
5. Use the result of 4 as a foreign key table to be joined on the LINEITEM table, and select the PS_AVAILQTY field, then group and aggregate by L_PARTKEY and L_SUPPKEY. As mentioned earlier, after grouping, the sub-query associated with the primary table with equivalent conditions can be rewritten as a joint statement with the primary table. Select L_SUPPKEY that satisfies the conditions after grouping and aggregation.
6. Use the result of 5 as a foreign key table to filter the SUPPLIER table
The whole process is to generate intermediate foreign key tables to do matching and filtering repeatedly.
The SPL script is as follows:
A |
|
1 |
=now() |
2 |
1995-1-1 |
3 |
=elapse@y(A2,1) |
4 |
>partname="bisque" |
5 |
>nationname="CHINA" |
6 |
=file("nation.btx").import@b().select(N_NAME==nationname).derive@o().keys@i(N_NATIONKEY) |
7 |
=file("part.ctx").open().cursor@m(P_PARTKEY;pos@h(P_NAME,partname)).fetch().keys@im(P_PARTKEY) |
8 |
=file("supplier.ctx").open().cursor@m(S_SUPPKEY,S_NAME,S_ADDRESS;A6.find(S_NATIONKEY)).fetch().keys@im(S_SUPPKEY) |
9 |
=file("partsupp.ctx").open().cursor@m(PS_PARTKEY,PS_SUPPKEY,PS_AVAILQTY;A7.find(PS_PARTKEY),A8.find(PS_SUPPKEY)).fetch().keys@im(PS_PARTKEY,PS_SUPPKEY) |
10 |
=file("lineitem.ctx").open().cursor@m(L_PARTKEY,L_SUPPKEY,L_QUANTITY;L_SHIPDATE>=A2 && L_SHIPDATE<A3) |
11 |
=A10.join@i(L_PARTKEY:L_SUPPKEY,A9:PS_PARTKEY:PS_SUPPKEY,PS_AVAILQTY) |
12 |
=A11.groups@u(L_PARTKEY,L_SUPPKEY,PS_AVAILQTY;sum(L_QUANTITY):quantity) |
13 |
=A12.select(PS_AVAILQTY*2>quantity).id(L_SUPPKEY) |
14 |
=A8.switch@i(S_SUPPKEY,A13) |
15 |
=A14.new(S_NAME,S_ADDRESS).sort@o(S_NAME) |
16 |
return interval@ms(A1,now()) |
Script execution time, Unit: seconds
Number of parallel |
1 |
2 |
4 |
8 |
12 |
Oracle |
437 |
307 |
242 |
201 |
175 |
SPL composite table |
128 |
65 |
36 |
21 |
15 |
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL