Performance optimization skill: Associating Small Fact Table with Big Dimension Table
I Problem introduction & Solution
When we handle the primary-sub table association query, we sometimes find that the filtered fact data is small enough to be wholly loaded in memory or slightly exceeds the memory whereas the dimension table to be associated contains a large volume of data, far larger than the available memory space. In this case, if the dimension table is ordered by its key, we can use binary search to locate the small amount of dimension records corresponding to the fact records all at once rather than traverse all records of the dimension table like what the HASH algorithm does, which improves the operation performance efficiently.
This is exactly the algorithm that SPL provides to us, and in the following part we’ll test the approach and compare it with the HASH JOIN algorithm in Oracle.
However, the algorithm is supposed to retrieve all association key values from the fact table, sort them, then collect as many association key values as possible at a time to search in the dimension table. In the processing, it is the searching that takes up most of the execution time, whereas the association itself is not time-consuming. Therefore, the approach is not sensitive to multi-thread in improving performance, then we’ll test it in both single thread and multi-thread.
II Test environment & computing scenario
The test computer has two Intel2670 CPUs, 2.6G frequency, 16 core in total, 64G memory and an SSD hard disk, where a 16-core virtual machine with 8G memory is set for testing.
On the virtual machine, we create a dimension table named account which consists of 3 fields (accountid, name and state, respectively) with a total of 5 billion records, and three fact tables with the same structure (trade1, trade2 and trade3) which contain 0.3 million, 3 million and 15 million records respectively with 4 fields – tradedate, outid (account from which money is transferred), receiveid (account to which money is transferred) and amount (transfer amount). The accountid in the account table is the foreign key for the outid field and received field in the fact tables, and both are in one-to-many relationships.
Our test goal is to query the total transfer amount between each two states according to the three fact tables with different data volume, respectively. Then we’ll test the multi-thread performance using the fact table trade2.
III Tests
1. Oracle
SQL query statement:
select /*+ parallel(n) */
a1.state,
a2.state,
sum(amount) as amount
from
account a1,
account a2,
trade1
where
outid = a1.accountid
and receiveid = a2.accountid
group by
a1.state,
a2.state
order by
a1.state,
a2.state;
/*+ parallel(n) */ is used to test the performance of multi-thread processing, and n is the number of parallel tasks.
2. SPL
SPL script:
A |
|
1 |
=now() |
2 |
=file(path+"account.ctx").open() |
3 |
=file(path+"trade1.ctx").open().cursor@m(outid,receiveid,amount;;1) |
4 |
=A3.joinx@q(outid,A2:accountid,state:out_state;receiveid,A2:accountid,state:receive_state;5000000) |
5 |
=A4.groups(out_state,receive_state;sum(amount):amount) |
6 |
=interval@s(A1,now()) |
The @q option is added to the joinx function to improve the association performance when the fact data is small, and the dimension data is big. Here the function’s last parameter defines the number of records to be retrieved from the fact table cursor at each association. When there is enough available memory space, the greater this value, the better the performance.
3. Test results
Below are the results of tests over different volumes of fact data (Unit: Sec):
Record number of a filtered fact table |
15 million |
3 million |
0.3 million |
Oracle |
17277 |
2747 |
421 |
SPL |
517 |
106 |
72 |
Here are the results of multi-thread processing tests using trade2 (with 3 million records) (Unit: Sec):
Number of threads |
1 |
2 |
4 |
8 |
16 |
Oracle |
2747 |
1589 |
1318 |
1375 |
1631 |
SPL |
106 |
109 |
117 |
115 |
109 |
IV Summary
From the test results, in single-thread processing, the SPL algorithm is many times faster than Oracle because it avoids the full traversal of the dimension table.
But in multi-thread processing, the parallel-insensitive SPL algorithm has little impact on performance improvement. On the contrary, Oracle does enhance the performance by using parallel processing, but it is still much slower due to the time-consuming full traversal.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL
Chinese version