Performance Optimization - 6.5 [Foreign key association] Index reuse

 

Performance Optimization - 6.4 [Foreign key association] Inner join syntax

Sometimes the dimension table will be filtered first and then associated. For example, we just want to know the products from a certain place of origin:

A
1 =file(“product.btx”).import@b().keys@i(id)
2 =file(“orders.btx”).cursor@b(p_id,quantity)
3 =A2.switch@i(p_id,A1)
4 =A3.select(p_id.state==“CA”)
5 =A4.groups(p_id.vendor;sum(p_id.price*quantity))

In this way, the amount of calculation will be relatively large, and in A4, it needs to repeatedly take out the p_id.state to do judgement. If the dimension table A1 is filtered first, and only the required records are left, then the inappropriate records in the fact table will be filtered by switch@i() in A3, and the performance will be better.

A
1 =file(“product.btx”).import@b().select(state==“CA”)
2 =file(“orders.btx”).cursor@b(p_id,quantity)
3 =A2.switch@i(p_id,A1)
4 =A3.groups(p_id.vendor;sum(p_id.price*quantity))

The calculation amount of this code will be much smaller than the previous one.

However, here comes a problem: after the dimension table is filtered and becomes a record sequence, its original index cannot be used. At this time, an index will be rebuilt when executing the switch(), and building an index also takes time such as calculating the hash value of primary key, and the time to build an index is not short when there are a large number of records. Moreover, because the filtering condition is unpredictable, the index cannot be prepared in advance.

In fact, the index on the original table can still be used. At least the hash value does not need to be recalculated, the only thing we need to do is to delete the filtered records from the index table. SPL provides this mechanism:

A
1 =file(“product.btx”).import@b().keys@i(id)
2 =file(“orders.btx”).cursor@b(p_id,quantity)
3 =A1.select@i(state==“CA”)
4 =A2.switch@i(p_id,A3)
5 =A4.groups(p_id.vendor;sum(p_id.price*quantity))

select@i() will reuse the index of original table on the filtered record sequence.

The reuse of index is not always faster, because the filtered records need to be deleted from the index table. If the number of filtered records is large (the remaining is small), this action will not be fast. In this case, the method of rebuilding the index may be faster. Which method to use depends on the actual situation.


Performance Optimization - 6.6 [Foreign key association] Aligned sequence
Performance Optimization - Preface