4.5 Filtering on dimension table
4.5.1 Index reuse after filtering
SQL
SELECT Suppliers.Region,sum(Amount) Amount
FROM Orders
LEFT JOIN Products ON Orders.ProductID= Products.ProductID
LEFT JOIN Suppliers ON Products.SupplierID=Suppliers.SupplierID
WHERE OrderDate>='2021-01-01' and OrderDate<='2021-12-31' and Products.CategoryID not in (1,2,3)
GROUP BY Suppliers.Region
SPL
A | |
---|---|
1 | >st=date(“2021-01-01”), et=date(“2021-12-31”), start= days@o(st), end=days@o(et) |
2 | =Products.select@i([1,2,3].pos@b(CategoryID)==null) |
3 | =file(“Orders_Time.ctx”).open().cursor@mx(ProductID, Amount; OrderDate>=start && OrderDate<=end, ProductID:A2) |
4 | =A3.groups(ProductID.SupplierID.Region;sum(Amount):Amount) |
A2 Use @i option to reuse index during filtering on dimension table.
A3 Put association with dimension table in the cursor generation statement, and records that do not have matches won’t be generated, reducing time of generating records.
Yet index reuse isn’t always faster. As the ineligible records need to be deleted from the index table, the deletion is not fast when a lot of records are deleted (the number of eligible records is very small). When there are a few eligible records, it is probably that re-creating index is faster. Select the right way according to different scenarios.
4.5.2 Re-creating index after filtering
SQL
SELECT Suppliers.Region,sum(Amount) Amount
FROM Orders
LEFT JOIN Products ON Orders.ProductID= Products.ProductID
LEFT JOIN Suppliers ON Products.SupplierID=Suppliers.SupplierID
WHERE OrderDate>='2021-01-01' and OrderDate<='2021-12-31' and Products.CategoryID in (1,2,3)
GROUP BY Suppliers.Region
SPL
A | |
---|---|
1 | >st=date(“2021-01-01”), et=date(“2021-12-31”), start=days@o(st), end=days@o(et) |
2 | =Products.select([1,2,3].pos@b(CategoryID)!=null).derive@o().keys@i(ProductID) |
3 | =file(“Orders_Time.ctx”).open().cursor@mx(ProductID,Amount;OrderDate>=start && OrderDate<=end,ProductID:A2) |
4 | =A3.groups(ProductID.SupplierID.Region;sum(Amount):Amount) |
A2 The dimension filtering action returns a sequence. We use derive@o() to restore it to a table sequence and re-create index.
In this example, the number of filtered records is large (the number of records left is small) and re-creating index is fast.
4.5.3 Filtering on multilevel dimension tables
SQL
SELECT Suppliers.Region,sum(Amount) Amount
FROM Orders
INNER JOIN Products ON Orders.ProductID= Products.ProductID
INNER JOIN Suppliers ON Products.SupplierID=Suppliers.SupplierID
WHERE OrderDate>='2021-01-01' and OrderDate<='2021-12-31'
and Suppliers.SupplierID in (1,2,3) and Products.CategoryID in (1,2,3,4,5)
GROUP BY Suppliers.Region
SPL
A filtering on multilevel dimension tables requires filtering each dimension table separately before creating association between them. The association only keeps matching records.
A | |
---|---|
1 | =Suppliers.select@i([1,2,3].pos@b(SupplierID)!=null) |
2 | =Products.select@i([1,2,3,4,5].pos@b(CategoryID)!=null) |
3 | =A2.switch@i(SupplierID, A1:SupplierID) |
A1 Filter dimension table Suppliers.
A2 Filter dimension table Products.
A3 Associate Products and Suppliers, where @i option enables deleting non-matching records.
Use the association result directly for summarization:
A | |
---|---|
… | /Here is the above code |
4 | >st=date(“2021-01-01”), et=date(“2021-12-31”), start=days@o(st), end=days@o(et) |
5 | =file(“Orders_Time.ctx”).open().cursor@mx(ProductID, Amount; OrderDate>=start && OrderDate<=end, ProductID:A3) |
6 | =A5.groups(ProductID.SupplierID.Region; sum(Amount):Amount) |
A5 Associate with the filtered dimension table and do not generate non-matching records.
4.5.4 Inverse filtering
SQL
SELECT sum(Amount)
FROM Orders
WHERE OrderDate>='2021-01-01' and OrderDate<='2021-12-31' and EmployeeID not in (
SELECT EmployeeID
FROM Employees
WHERE Title='Sales Representative')
SPL
A | |
---|---|
1 | >st=date(“2021-01-01”), et=date(“2021-12-31”), start=days@o(st), end=days@o(et) |
2 | >Employees=file(“Employees.btx”).import@b().select(Title==“Sales Representative”).derive().keys@i(EmployeeID) |
3 | =file(“Orders_Time.ctx”).open().cursor@mx(Amount;OrderDate>=start && OrderDate<=end,EmployeeID:Employees:null) |
4 | =A3.groups(;sum(Amount)) |
A3 Associate with dimension table in the composite table cursor. The syntax :null means selecting non-matching records only.
When the fact table is not a composite table, we can use @d option in switch to select non-matching records.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL