The significance of ordered storage for high performance

 

Ordered storage means storing the data after sorting them by certain fields. This storage method enables us to implement some high-performance algorithms, and utilize the ordered feature of data to reduce computing complexity, thus greatly improving computing performance.

Index-free direct search

In search calculation, we often need to search for the target record by the equivalence condition of a field. For example, search the order table for a certain order number, or search the transaction table for a certain customer's transaction record. Such search often occurs in online query scenarios, requiring a second-level response speed, and often comes with higher concurrency access. To meet this requirement, a common method is to create an index for the to-be-searched (TBS) field in advance, and then find the location of target record in the original table through the index, and finally fetch the data from original table.

If the data of original table is stored orderly by TBS field, then the binary search can be used in computing. In this way, very high performance can be achieved without resorting to an index, thus implementing index-free search. Assuming that the total amount of data is N, the time complexity of using binary search will be logN (with 2 as the base number), you can see that the larger the amount of data, the more significant the performance improvement.

esProc SPL supports storing the data in a physically ordered manner, which can implement such index-free search. In addition, SPL syntax is very concise. For example, if the data in table Tis stored orderly by primary key id, then the code for searching for the record whose idis 10100 only needs one statement:

=file("T.btx").iselect@b(10100,id).fetch()

In fact, when there are many target records that meet the equivalence condition, if the data of original table is not stored in order, then it is difficult to achieve an extreme performance even if the index is already created. The reason is that the index table is usually ordered by TBS field, and searching the target record in index table will be fast. However, since the original table itself is not stored orderly by TBS field, and the records to be searched for may appear anywhere in the original table, multiple original table locations obtained from index table are not continuous. However, there is a minimum unit for reading hard disk, and usually this unit is much larger than the space occupied by one record. When reading discontinuous data on hard disk, a lot of irrelevant content will be read, which will slow down the search speed. Especially in the case of high concurrency, if each search is a bit slower, the overall performance will be very poor.

If the original table is stored orderly by the TBS field, then it can ensure that the records with the same search value are stored together in a continuous manner. In this way, all data blocks read from hard disk during search are almost the target value, and thus the performance will naturally be improved dramatically. SPL code in this case is also very simple, just add an option @rto the functioniselect.

Although it is slow to pre-sort, it only needs to sort once. If you often need to search a certain table by the equivalence condition of a field, you can sort the table first, and you can achieve a better performance in subsequent searches.

Ordered grouping

Grouping calculation for large data amount is also common, such as bank account statistics, e-commerce funnel analysis and user behavior analysis. Such calculation is characterized as large in total data amount, large in number of groups, but relatively small in data amount of each group. Grouping calculation generally takes place within each group and doesn’t involve the data of other groups. Moreover, this calculation is not always a simple calculation such as summation and averaging, but likely to be a particularly complex algorithm, which requires writing multi-step code to implement. Therefore, we’d better load each group's data into memory separately for computation.

If we pre-sort the data by grouping field and then store them, then we can read one group of data at a time in order during grouping calculation. In this way, the reading of hard disk is continuous, and the performance can be guaranteed. Assuming that the data in the table Tis ordered by the grouping field gidand time field etime, the SPL code for calculating the number of the earliest 3 record types in each group is roughly as follows:


A

1

=file("T.ctx").open().cursor(gid,etime,type;…)

2

=A1.group(gid).select(~.len()>=3)

3

=A2.groups(~(1).type:t1,~(2).type:t2,~(3).type:t3;count(1):c)

For specific scenarios and calculation methods, visit: SQL Performance Enhancement: Count N Earliest Events in each Group.

SPL’s ordered grouping algorithm is also well suited for particularly complex computations such as e-commerce funnel analysis, this algorithm can dramatically reduce the computing complexity and makes it easy toachieve concise code and high performance. For details, visit: How to speed up funnel analysis of e-commerce system.

Merge association

Another scenario where performance problem often takes place is the association of large data table. In this scenario, the database generally adopts the hashing & partitioning algorithm, the complexity of which is quadratic. Moreover, when the database performs the operation of big data in external storage, the hashing & partitioning algorithm will cause the read and write of the buffer files, yet the read and write of hard disk will greatly reduce the computing performance.

In many cases, the large table is associated by primary key or part of primary keys. If a large table is stored in order by primary key, the ordered merge algorithm can be employed to implement association. The complexity of ordered merge algorithm is linear, so the performance of this algorithm is much better than that of hashing & partitioning algorithm. Moreover, the ordered merge algorithm only needs to traverse the two tables in turn, without resorting to external storage to buffer the data, therefore the amount of IO can be greatly reduced, and the performance is significantly improved.

Assuming that tables aand bare pre-stored orderly by primary key, the code to implement the join operation in SPL’s ordered merge algorithm is approximate as follows:


A

1

=file("a.btx").cursor@b()

2

=file("b.btx").cursor@b()

3

=joinx(A1,id;A2,id)

3

=joinx@1(A1,id;A2,id)

3

=joinx@f(A1,id;A2,id)

For more detailed description of ordered merge algorithm, visit: SPL Order-based Merge Join.

Association with big dimension table

If the non-primary-key fields of one table is associated with the primary key field of the other, then we call the former table the fact table, and the latter table the dimension table. If both the dimension table and fact table are large, it is difficult to improve performance when performing association calculation.

To solve this problem, the database generally adopts the hashing & partitioning algorithm, which works in a way that first calculate the hash value of the association key in two tables respectively, and then divide the data whose hash value is within a certain range into a partition so as to form the buffer data in external storage, and ensure that each partitionis small enough to be loaded into memory, and finally execute the in-memory join algorithm for each pair of partitions (two tables) one by one. Since this algorithm will split and buffer the data of both large tables, it can also be called two-side partitioningalgorithm. When we are in bad luck with hash function, a second round of hashing may be required as a result of a certain too large partition.

If the dimensional table is stored in order by primary key, the one-side partitioningalgorithm can be adopted to implement its association. The advantages of this algorithm include: i)it can divide the dimension table into equal segments, and hence it avoids the second round of hashing and partitioning caused by the bad luck with hash function; ii) since the amount of data to be buffered is much less than that of two-side partitioning algorithm, the performance is better. SPL code is roughly as follows:


A

1

=file("customer.btx")

2

=file("orders.btx").cursor@b(c_id,amount)

3

=A2.joinx@u(c_id,A1:cid,area,discount)

4

=A3.groups(area;amount*discount)

For the detailed introduction to the principle of one-side partitioning algorithm, visit: One-side partitioning algorithm of SPL.

Redundant sorting

In many cases, it is not enough to orderly store the data only by one way. For example, for the search calculation described earlier, the calculation performance is good when the data is stored in order by TBS field, however, the performance is not satisfactory when the data is filtered by another non-ordered-field. Theoretically, each field may be used for filtering, but if the data is sorted by every field, it’s equivalent to copying the data several times, resulting in a high storage cost.

To reduce redundancy, we can store two copies of dataset, one copy is sorted by fields F1…, Fnand stored, and the other copy is sorted by Fn…,F1and stored. In this way, although the amount of data is doubled, it is still acceptable. For any field F, there will always be a dataset where Fis in the first half of sorted dimension list. When searching, even if the TBS field is not the first sorted field, and the filtered data cannot concatenate into a single area in general, the data are still composed of some relatively large continuous areas. The closer to the top the position of field in the sorted field list, the higher the degree of physical order of data after filtering will be.

When the same data is redundantly stored in multiple copies by different sorting ways, SPL's functioncgroupswill select the most appropriate copy for calculation based on the filter criteria field. For details, visit: Multidimensional Analysis Backend Practice 4: Pre-aggregate and Redundant Sorting.

In practice, we can also store the data orderly by other ways, or code in SPL to artificially select appropriately ordered data for calculation.