"I. Query Requirement Q22 queries to identity the geographies where there are customers likel .."

pjf RaqForum 22 No.
1 Reply • 696 View • 5 Years ago

Learn performance optimization skills from TPCH tests - Q22

I. Query Requirement

Q22 queries to identity the geographies where there are customers likely to make a purchase. This query counts the number of consumers in specified countries who are more positive than the average but have not placed orders for seven years. It also reflects the attitude of ordinary consumers, that is, purchasing intention.

Q22 is characterized by two tables join operations with grouping, sorting, aggregation, sub-query and NOT EXISTS sub-query.

II. Oracle Execution

The query SQL written in Oracle is as follows:

    select  /*+ parallel(n) */
	cntrycode,
	count(*) as numcust,
	sum(c_acctbal) as totacctbal
    from
	(
	    select
		substr(c_phone,1,2) as cntrycode,
		c_acctbal
	    from
		customer
	    where
		substr(c_phone,1,2) in
		    ('11', '14', '15', '19', '20', '21', '23')
		and c_acctbal > (
		    select
			avg(c_acctbal)
		    from
			customer
		    where
			c_acctbal > 0.00
			and substr(c_phone,1,2) in
			    ('11', '14', '15', '19', '20', '21', '23')
	        )
		and not exists (
	            select
		        *
		    from
			orders
		    where
			o_custkey = c_custkey
		)
	) custsale
    group by
	cntrycode
    order by
	cntrycode;

Where /*+ parallel(n) */ is the parallel query syntax of Oracle, and n is the parallel number.

Script execution time, Unit: seconds

Number of parallel	1	2	4	8	12
Oracle	128	86	70	53	48

III. SPL Optimization

The outer layer of this query is a conventional grouping aggregation, whose main complexity lies in the two sub-queries used for conditions in the inner layer. These two sub-queries can correspond to some CUSTOMER records that satisfy the conditions, while the latter sub-query is modified with NOT EXIST, so the final result will be the difference set of the two set of CUSTOMER records corresponding to the two sub-queries.

The SPL script is as follows：

	A
1	=now()
2	=["11","14","15","19","20","21","23"]
3	=file("customer.ctx").open().cursor@m(C_CUSTKEY,C_PHONE,C_ACCTBAL;C_ACCTBAL>0.0 && A2.contain(left(C_PHONE,2))).fetch()
4	=A3.avg(C_ACCTBAL)
5	=A3.select(C_ACCTBAL>A4).derive@o().keys@i(C_CUSTKEY)
6	=file("orders.ctx").open().cursor@m(O_CUSTKEY;A5.find(O_CUSTKEY))
7	=A6.groups(O_CUSTKEY:C_CUSTKEY)
8	=[A5,A7].merge@d(C_CUSTKEY)
9	=A8.groups(left(C_PHONE,2):cntrycode;count(1):numcust, sum(C_ACCTBAL):totacctbal)
10	return interval@ms(A1,now())

The results of grouping operation in SPL are ordered, that is, A7 is ordered to C_CUSTKEY, while A5 itself comes from CUSTOMER table, which is also ordered to C_CUSTKEY, thus two ordered sets can use merge algorithm to calculate the difference set at high speed (in A8).

Script execution time, Unit: seconds

Number of parallel	1	2	4	8	12
Oracle	128	86	70	53	48
SPL composite table	102	53	29	24	19

SPL Official Website 👉 https://www.scudata.com

SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL

SPL Learning Material 👉 https://c.scudata.com

SPL Source Code and Package 👉 https://github.com/SPLWare/esProc

Discord 👉 https://discord.gg/2bkGwqTj

Youtube 👉 https://www.youtube.com/@esProc_SPL

esProc

pjf • 696 View • 5 Years ago

Learn performance optimization skills from TPCH tests - Q22

I. Query Requirement

II. Oracle Execution

III. SPL Optimization

ToC