Performance optimization skill: Associating Small Fact Table with Big Dimension Table

 

I Problem introduction & Solution

When we handle the primary-sub table association query, we sometimes find that the filtered fact data is small enough to be wholly loaded in memory or slightly exceeds the memory whereas the dimension table to be associated contains a large volume of data, far larger than the available memory space. In this case, if the dimension table is ordered by its key, we can use binary search to locate the small amount of dimension records corresponding to the fact records all at once rather than traverse all records of the dimension table like what the HASH algorithm does, which improves the operation performance efficiently.

This is exactly the algorithm that SPL provides to us, and in the following part well test the approach and compare it with the HASH JOIN algorithm in Oracle.

However, the algorithm is supposed to retrieve all association key values from the fact table, sort them, then collect as many association key values as possible at a time to search in the dimension table. In the processing, it is the searching that takes up most of the execution time, whereas the association itself is not time-consuming. Therefore, the approach is not sensitive to multi-thread in improving performance, then well test it in both single thread and multi-thread. 

II Test environment & computing scenario

The test computer has two Intel2670 CPUs, 2.6G frequency, 16 core in total, 64G memory and an SSD hard disk, where a 16-core virtual machine with 8G memory is set for testing. 

On the virtual machine, we create a dimension table named account which consists of 3 fields (accountid, name and state, respectively) with a total of 5 billion records, and three fact tables with the same structure (trade1, trade2 and trade3) which contain 0.3 million, 3 million and 15 million records respectively with 4 fields – tradedate, outid (account from which money is transferred), receiveid (account to which money is transferred) and amount (transfer amount). The accountid in the account table is the foreign key for the outid field and received field in the fact tables, and both are in one-to-many relationships.

Our test goal is to query the total transfer amount between each two states according to the three fact tables with different data volume, respectively. Then well test the multi-thread performance using the fact table trade2. 

 

III Tests

1. Oracle

SQL query statement:
select   /*+ parallel(n) */
       a1.state,
       a2.state,
       sum(amount) as amount
from
       account a1,
       account a2,
       trade1
where
       outid = a1.accountid
       and receiveid = a2.accountid
group by
       a1.state,
       a2.state
order by
       a1.state,
       a2.state;

/*+ parallel(n) */ is used to test the performance of multi-thread processing, and n is the number of parallel tasks.

2. SPL

SPL script:


A

1

=now()

2

=file(path+"account.ctx").open()

3

=file(path+"trade1.ctx").open().cursor@m(outid,receiveid,amount;;1)

4

=A3.joinx@q(outid,A2:accountid,state:out_state;receiveid,A2:accountid,state:receive_state;5000000)

5

=A4.groups(out_state,receive_state;sum(amount):amount)

6

=interval@s(A1,now())

 

The @q option is added to the joinx function to improve the association performance when the fact data is small, and the dimension data is big. Here the functions last parameter defines the number of records to be retrieved from the fact table cursor at each association. When there is enough available memory space, the greater this value, the better the performance. 

 

3. Test results

Below are the results of tests over different volumes of fact data (Unit: Sec):

 

Record number of a filtered fact table

15 million

3 million

0.3 million

Oracle

17277

2747

421

SPL

517

106

72

 

Here are the results of multi-thread processing tests using trade2 (with 3 million records) (Unit: Sec):

 

Number of threads

1

2

4

8

16

Oracle

2747

1589

1318

1375

1631

SPL

106

109

117

115

109

 

IV Summary

From the test results, in single-thread processing, the SPL algorithm is many times faster than Oracle because it avoids the full traversal of the dimension table.

But in multi-thread processing, the parallel-insensitive SPL algorithm has little impact on performance improvement. On the contrary, Oracle does enhance the performance by using parallel processing, but it is still much slower due to the time-consuming full traversal.