Learn performance optimization skills from TPCH tests - Q15
I. Query Requirement
Q15 queries the information of the supplier (ranking first) who contributed the most to the total revenue in a certain period of time, which can be used to decide which first-class suppliers will be given rewards, more orders, special certifications, encouragement and so on.
The characteristic of Q15 is joining ordinary table and view with grouping, sorting, aggregation and aggregation on sub-query.
II. Oracle Execution
The query SQL written in Oracle is as follows:
create view revenue (supplier_no, total_revenue) as
select
l_suppkey,
sum(l_extendedprice * (1 - l_discount))
from
lineitem
where
l_shipdate >= date '1995-04-01'
and l_shipdate < date '1995-04-01' + interval '3' month
group by
l_suppkey;
select /*+ parallel(n) */
s_suppkey,
s_name,
s_address,
s_phone,
total_revenue
from
supplier,
revenue
where
s_suppkey = supplier_no
and total_revenue = (
select
max(total_revenue)
from
revenue
)
order by
s_suppkey;
drop view revenue;
Where /*+ parallel(n) */ is the parallel query syntax of Oracle, and n is the parallel number.
Script execution time, Unit: seconds
Number of parallel |
1 |
2 |
4 |
8 |
12 |
Oracle |
361 |
276 |
218 |
170 |
155 |
III. SPL Optimization
This query is divided into two stages, first calculating the view revenue and then finding the record in revenue where total_revenue reaches its maximum value. The former is a conventional grouping aggregation, using parallel and columnar storage to improve performance. The latter can only be written as a sub-query in SQL, which requires two traversals. SQL does not have reference and set data types. Therefore it cannot directly return the record where the maximum value is located. In contrast, SPL provides such syntax which can return either the maximum value itself or the record where the maximum value is located during one traversal.
The SPL script is as follows:
A |
|
1 |
=now() |
2 |
1995-4-1 |
3 |
=elapse@m(A2,3) |
4 |
=file("lineitem.ctx").open().cursor@m(L_SUPPKEY,L_EXTENDEDPRICE,L_DISCOUNT;L_SHIPDATE>=A2 &&L_SHIPDATE< A3) |
5 |
=A4.groups@n(L_SUPPKEY:supplier_no;sum(L_EXTENDEDPRICE*(1-L_DISCOUNT)):total_revenue) |
6 |
=A5.maxp@a(total_revenue) |
7 |
=file("supplier.ctx").open() |
8 |
=A6.joinx@q(supplier_no,A7:S_SUPPKEY,S_NAME,S_ADDRESS,S_PHONE).fetch() |
9 |
return interval@ms(A1,now()) |
A5 calculates the view revenue, and A6 uses maxp@a to traverse once and return the records with the largest total_revenue, then goes to the supplier table to retrieve other fields, reducing the amount of calculation.
Script execution time, Unit: seconds
Number of parallel |
1 |
2 |
4 |
8 |
12 |
Oracle |
361 |
276 |
218 |
170 |
155 |
SPL composite table |
99 |
52 |
29 |
20 |
18 |
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL