3.2 Order-based filtering
3.2.1 Bin file
SQL
SELECT count (DISTINCT CustomerID) iNum,count(1) Num,EmployeeID,EmployeeName
FROM Orders
WHERE OrderDate>='2022-03-15' and OrderDate<='2022-06-16' and Amount>1000
GROUP BY EmployeeID,EmployeeName
ORDER BY EmployeeID
SPL
A | |
---|---|
1 | >st=date(“2022-03-15”), et=date(“2022-06-16”), start=days@o(st), end=days@o(et) |
2 | =file(“Orders_Time.btx”).iselect@br(start:end,OrderDate; EmployeeID, EmployeeName, CustomerID,Amount).select(Amount>1000).groups(EmployeeID; EmployeeName, count(1):Num, icount(CustomerID):iNum) |
A2 iselect function retrieves records whose OrderDate values fall within the interval of [start,end] from the file ordered by OrderDate and generates a cursor with them. Here [start,end] is a closed interval. @r option means that there are duplicate OrderDate values.
The iselect function cannot work with a parallel processing option. With a scenario involving just one computing task and when the select result set is relatively large, it is not necessarily that the order-based filtering is faster than the parallelly processed full traversal. With a multi-task scenario, usually parallel processing is not feasible (because the multi-CPU must deal with concurrency). And in that case, iselect has big advantages.
As data is arranged in time order, we cannot write Amount>1000 within iselect but can only use select to perform a filtering outside it. The iselect result set is small enough so the approach has little impact on the performance.
3.2.2 Composite table
SQL
SELECT count (DISTINCT CustomerID) iNum,count(1) Num,EmployeeID,EmployeeName
FROM Orders
WHERE OrderDate>='2022-03-15' and OrderDate<='2022-06-16' and Amount>1000
GROUP BY EmployeeID,EmployeeName
ORDER BY EmployeeID
SPL
A | |
---|---|
1 | >st=date(“2022-03-15”), et=date(“2022-06-16”), start=days@o(st), end=days@o(et) |
2 | =file(“Orders_Time.ctx”).open().cursor@mx(OrderDate,CustomerID,EmployeeID,EmployeeName;OrderDate>=start && OrderDate<=end && Amount>1000) |
3 | =A2.groups(EmployeeID; EmployeeName, count(1):Num, icount(CustomerID):iNum) |
A2 Import data falling within the specified range of time. @m option enables parallel processing.
Writing the condition in cursor()during filtering enables automatic block skipping through comparison with the dimension field. The ineligible blocks are directly skipped. The syntax cursor().select(…) does not perform the block skipping action but first scans all blocks.
Unlike a bin file, the composite table supports writing Amount>1000 in cursor() together with the time filtering condition. After block skipping action, eligible data is automatically identified in each block during traversal and filtering is executed.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL