Performance optimization case course: TPCH-Q6

 

select
    sum(l_extendedprice * l_discount) as revenue
from
    lineitem
where
    l_shipdate >= date '1995-01-01'
    and l_shipdate < date '1995-01-01' + interval '1' year
    and l_discount between 0.05 - 0.01 and 0.05 + 0.01
and l_quantity < 24;

This is an aggregation operation after a single table is filtered.

1. Data storage

There is no special sorting requirement, just store lineitem in order by the primary key orderkey. So, we can continue to use lineitem.ctx from Q3. Copy it to the main directory of this query.

2. General method

Calculation code:


A

B

1

=now()


2

1995-1-1

=elapse@y(A2,1)

3

=0.05-0.01

=0.05+0.01

4

>quantity=24

5

=file("lineitem.ctx").open().cursor@m(L_EXTENDEDPRICE,L_DISCOUNT;L_SHIPDATE>=A2 && L_SHIPDATE<B2 && L_DISCOUNT>=A3 && L_DISCOUNT<=B3 && L_QUANTITY<quantity)

6

=A5.total(sum(L_EXTENDEDPRICE*L_DISCOUNT))

7

=interval@ms(A1,now())

Note that the cursor function in A5 cannot directly contain an expression like L_EXTENDEDPRICE*L_DISCOUNT, which should be written in A6.

Test result:

Test items

Execution time (seconds)

General method

6

3. Data conversion

Utilize the conversion method mentioned in Q1: convert date to integers.

Since the L_SHIPDATE in lineitem is already integerized in Q1, we can directly use lineitem6.ctx. Copy it to the main directory of this query.

Calculation code:


A

B

1

=now()


2

1995-1-1


3

=days@o(A2)

=days@o(elapse@y(A2,1))

4

=0.05-0.01

=0.05+0.01

5

>quantity=24

6

=file("lineitem6.ctx").open().cursor@m(L_EXTENDEDPRICE,L_DISCOUNT;L_SHIPDATE>=A3 && L_SHIPDATE<B3 && L_DISCOUNT>=A4 && L_DISCOUNT<=B4 && L_QUANTITY<quantity)

7

=A6.total(sum(L_EXTENDEDPRICE*L_DISCOUNT))

8

=interval@ms(A1,now())

Test result:

Test items

Execution time (seconds)

General method

6

Data conversion

4

4. Column-wise computing

Calculation code:


A

B

1

=now()


2

1995-1-1


3

=days@o(A2)

=days@o(elapse@y(A2,1))

4

=0.05-0.01

=0.05+0.01

5

>quantity=24

6

=file("lineitem6.ctx").open().cursor@m(L_EXTENDEDPRICE,L_DISCOUNT;L_SHIPDATE>=A3 && L_SHIPDATE<B3 && L_DISCOUNT>=A4 && L_DISCOUNT<=B4 && L_QUANTITY<quantity)

7

=A6.total(sum(L_EXTENDEDPRICE*L_DISCOUNT))

8

=interval@ms(A1,now())

Test result:

Test items

Execution time (seconds)

General method

6

Data conversion

4

Column-wise computing

2