Performance Optimization - 4.9 [Traversal technology] Column-wise computing

 

Performance Optimization - 4.8 [Traversal technology] Redundant grouping key

Structured data usually takes record as basic unit, and a data table is a set of records. The calculation for data table is usually done on a record-by-record basis, that is, the calculation is performed on one record at a time, and then it moves on to the next record. For example, the filtering or grouping operation mentioned earlier requires calculating the filter condition or grouping key for each record, and then immediately performing the corresponding filtering or grouping action based on the calculation results.

Since records are also known as rows of a data table, this computing method is also called row-wise computing.

There is also a field-based computing method, which will complete all relevant calculations for all records in one go before proceeding to the next step. For example, calculate the filter conditions or grouping keys corresponding to all records first, and then perform the corresponding filtering or grouping action.

Since fields are also known as columns of a data table, this computing method is also called column-wise computing.

There is no substantial difference in the number of calculations between row-wise computing and column-wise computing, which means the complexity of these two algorithms is theoretically the same. That being the case, it seems unnecessary to have two computing methods. In fact, it’s still necessary, because the column-wise computing has significant advantages in practice.

Row-wise computing requires data to be stored in a row-based storage, which means storing all fields of a record as one record (object). The computation of structured data is very flexible, and it is possible to compute records of different data structures at any time (the same is true for operations such as select, group, and join). If it needs to design a new data structure each time, the code would become very cumbersome. To solve this, programmers will design a universal data object to store records of various structures. Since the fields of a record usually have different types, it also needs to use the same generic object to store them.

However, using generic object will cause a lot of trouble in storage. A generic object that can store various data types is much more complex than simple data, as it not only needs to store the data itself, but also an additional data type. Different data types occupy different amounts of space, it usually needs a dynamic object to store. However, this not only adds an extra layer, resulting in more actions to access the data, but also occupies much more space, making it more likely to trigger swapping actions of the operating system or virtual machine when memory is insufficient, seriously hindering performance.

Moreover, when it comes to CPU, different types of data need different instructions to perform operation. For example, the instructions for addition of integers and floating-point numbers are different. As a result, when computing, it needs to translate into different CPU instructions based on the data types involved, which adds an extra step of judgment.

The data type of the same field in a structured data table is usually the same. We call the data table that meets this condition the pure table. For pure table, we can use column-wise computing to avoid the extra actions in row-wise computing.

Column-wise computing will adopt the columnar storage in memory, which will store the same column of all records in the data table together. This storage method is similar to the columnar storage of external storage. Because the data type of the same field is the same, we can simply use an array object to store the data of the same field. In this way, the whole data table can be stored with multiple arrays. If a data table has n rows and m columns, then for column-wise storage, only m array objects are needed to store the table, while for row-wise storage, it needs at least n*m generic objects (to store field values) and n array objects (to store records). Therefore, column-wise computing has a huge advantage in memory occupancy.

Moreover, accessing the members of an array is much simpler than accessing generic object. During the calculation process, since the data type in the same column is the same, when calculating by column, it only needs to parse the expression once at the beginning, and then generate different calculation steps according to the data type of columns, without the need to judge the data type for each row, and the actions of CPU will also be much fewer.

Actual tests show that for computationally intensive tasks, column-wise computing can be several to dozens of times faster than row-wise computing. Although they have the same complexity in theory, optimization in practice is also very meaningful.

SPL also provides column-wise computing scheme.

A
1 =file(“orders.btx”).import@b()
2 =A1.i()
3 =A2.groups(area;sum(amount):amount)

Use the i() function to convert the ordinary table sequence to pure table sequence, and then the subsequent calculations will automatically use the column-wise computing scheme.

For composite tables based on columnar storage, you can also first generate a pure cursor directly, and then perform high-speed column-wise computing.

A
1 =file(“orders.ctx”).open()
2 =A1.cursor@v(area,amount)
3 =A2.groups(area;sum(amount):amount)

Using the @v option can generate a pure cursor, and then it will also use the column-wise computing automatically.

Text files and bin files themselves are in row-based storage, and do not support direct generation of pure cursor.

It should be emphasized that column-wise computing can only be performed for pure data table (cursor), and there are some restrictions (if the intermediate results generated during computation are impure or cannot be determined to be pure, then the use of column-wise computing cannot be continued). For more information, refer to the article on the SPL forum. This book focuses on the explanation of algorithm complexity, and most of the examples given in it still use the more general row-wise computing scheme. You can convert them to column-wise computing according to your application environment.


Performance Optimization - 5.1 [Ordered traversal] Ordered grouping and aggregating
Performance Optimization - Preface