3.5 Order-based COUNT DISTINCT
SQL
SELECT count (DISTINCT CustomerID) iNum,count(1) Num
FROM Orders
WHERE OrderDate>='2021-01-01' and OrderDate<='2021-12-31'
SPL
A | |
---|---|
1 | >st=date(“2021-01-01”), et=date(“2021-12-31”), start=days@o(st), end=days@o(et) |
2 | =file(“Orders_Account.ctx”).open().cursor@mx(CustomerID;OrderDate>=start && OrderDate <=end;2) |
3 | =A2.groups(;count(1):Num, icount@o(CustomerID):iNum) |
A2 With @p option used during the data preparation phase, we can just use @m option to accomplish the parallel computation.
A3 icount() works with @o option to perform order-based count distinct, during which there is no need to retain intermediate buffer in the memory and only the comparison with the previous record is enough. By default, the function takes the distinct field as unordered and should keep intermediate buffer in the memory. In that case, memory overflow occurs when buffer exceeds the memory capacity.
SQL
SELECT count(DISTINCT CustomerID),month(OrderDate) months
FROM Orders
WHERE OrderDate>='2021-01-01' and OrderDate<='2021-12-31'
GROUP BY month(OrderDate)
SPL
A | |
---|---|
1 | >st=date(“2021-01-01”), et=date(“2021-12-31”), start=days@o(st), end=days@o(et) |
2 | =file(“Orders_Account.ctx”).open().cursor@mx(CustomerID, OrderDate;OrderDate>=start && OrderDate <=end;2) |
3 | =A2.groups(month(OrderDate):months; icount@o(CustomerID):iNum) |
A3 groups function plus icount function is used to compute count distinct on grouped subsets, without first grouping data.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL