Performance Optimization - 5.5 [Ordered traversal] First-half ordered grouping

 

Performance Optimization - 5.4 [Ordered traversal] Program cursor

Now we want to change the order of the account transaction table containing account and date information in a way that sorts the transaction records under an account by the day of the week when they occurred, that is, put all the transactions occurred on Sunday together, and then put together the transactions on Monday, Tuesday, and so on. Essentially, this operation changes the sorting method of original table, and it is a big sorting operation. If using the big sorting algorithm, it will need to generate buffer files, and the performance will not be good.

However, we found that the original data table is already sorted by account, so it can be considered a “half ordered” table, and the amount of unordered data within each (ordered) account is not large. At this point, we can use the program cursor mentioned above to perform this large sorting.

A B C
1 func =file(“trades.btx”).cursor@b()
2 for B1;id return B2.sort(day@w(dt))
3 return cursor@c(A1)

Fetching the data of each account and then only sorting the “second-half” data allows the program cursor to collect the results and return the desired big sorting result.

Of course, this method can also be used for grouping. For example, if we want to calculate the total transaction amount of each account on a day of each week, we just need to change the above A3 to:

return cursor@c(A1).group(id,day@w(dt);sum(amount))

Alternatively, we can handle it in the subprogram, which will return less data:

A B C
1 func =file(“trades.btx”).cursor@b()
2 for B1;id return B2.groups(id,day@w(dt);sum(amount))
3 return cursor@c(A1)

This situation is not uncommon in reality. SPL directly provides an option at the group() function:

A
1 =file(“trades.btx”).cursor@b()
2 =A1.group@q(id;day@w(dt);sum(amount))

Note that group@q()has three groups of parameters. The first represents the ordered grouping key, the second represents the unordered grouping key, and the third represents the aggregate expression. By adding the @q option, SPL will know that it should firstly use the ordered cursor to fetch the data with same id, and then group by day@w(dt).

It can also be used directly for sorting:

A
1 =file(“trades.btx”).cursor@b()
2 =A1.group@qs(id;day@w(dt))

Adding the @s option means to sort only without grouping, that is, to sort the data with the same id by day@w(dt).


Performance Optimization - 5.6 [Ordered traversal] Second-half ordered grouping
Performance Optimization - Preface