4.1 Storing dimension table in memory
4.1.1 Single-level dimension table
SQL
SELECT Employees.Region,sum(Orders.Amount)
FROM Orders
LEFT JOIN Employees ON Orders.EmployeeID=Employees.EmployeeID
WHERE OrderDate>='2021-01-01' and OrderDate<='2021-12-31'
GROUP BY Employees.Region
SPL
A | |
---|---|
1 | >st=date(“2021-01-01”), et=date(“2021-12-31”), start=days@o(st), end=days@o(et) |
2 | >Employees=file(“Employees.btx”).import@b().keys@i(EmployeeID) |
3 | =file(“Orders_Time.ctx”).open().cursor@mx(EmployeeID,Amount;OrderDate>=start && OrderDate<=end) |
4 | =A3.switch(EmployeeID,Employees:EmployeeID) |
5 | =A4.groups(EmployeeID.Region;sum(Amount):Amount) |
A2 Retrieve dimension table data from bin file Employees.btx into memory, set primary key and create index on it.
A4 Associate A3’s cursor with the in-memory dimension table Employees.
A5 Perform small-result grouping on A3’s associated cursor.
To use switch() to associate the dimension table, we need to set primary key on the dimension table in advance, through which the association is established. Once the association is created, values of the fact table’s join field are converted to references of corresponding dimension table records. Then we can refence any field of the dimension table using “fact table field.dimension table field”.
The dimension table is often reused. As it is generally not large, we can load it to the memory beforehand, create index for it on the primary key, and store it in a global variable. This saves us the trouble of retrieving the dimension table and creating index on the primary key during summarization by using the global variable directly. The method amounts to dividing the above code into two parts – one loads the dimension table as a global variable at the startup of the server, and the other handles data summarization:
The first part (execute once at the startup of the server):
A | |
---|---|
1 | >env(Employees, file(“Employees.btx”).import@b().keys@i(EmployeeID)) |
The second part (data summarization):
A | |
---|---|
1 | >st=date(“2021-01-01”), et=date(“2021-12-31”), start=days@o(st), end=days@o(et) |
2 | =file(“Orders_Time.ctx”).open().cursor@mx(EmployeeID,Amount;OrderDate>=start && OrderDate<=end) |
3 | =A2.switch(EmployeeID,Employees:EmployeeID) |
4 | =A3.groups(EmployeeID.Region;sum(Amount):Amount) |
4.1.2 Pre-association of multilevel dimension tables
SQL
SELECT Suppliers.Region,sum(Orders.Quantity* Orders.Price)
FROM Orders
LEFT JOIN Products ON Orders.ProductID= Products.ProductID
LEFT JOIN Suppliers ON Products.SupplierID=Suppliers.SupplierID
WHERE OrderDate>='2021-01-01' and OrderDate<='2021-12-31'
GROUP BY Suppliers.Region
SPL
We can also load multilevel dimension tables wholly into memory at server startup, store them as global variables, and create index on primary key and association between dimension tables.
A | |
---|---|
1 | >env(Suppliers,file(“Suppliers.btx”).import@b().keys@i(SupplierID)) |
2 | >env(Products,file(“Products.btx”).import@b().keys@i(ProductID)) |
3 | >Products.switch(SupplierID, Suppliers:SupplierID) |
A1 Load Suppliers dimension table and create index on primary key.
A2 Load Products dimension table and create index on primary key.
A3 Create association between Products and Suppliers.
Use the prepared tables and the association result directly during data summarization:
A | |
---|---|
1 | >st=date(“2021-01-01”), et=date(“2021-12-31”), start= days@o(st), end=days@o(et) |
2 | =file(“Orders_Time.ctx”).open().cursor@mx(ProductID,Quantity,Price;OrderDate>=start && OrderDate<=end) |
3 | =A2.switch(ProductID,Products:ProductID) |
4 | =A3.groups(ProductID.SupplierID.Region;sum(Quantity*Price):Amount) |
A3 Associate A2’s cursor with the in-memory dimension table Products.
A4 Perform small-result-set grouping on A3’s associated cursor.
For multilevel dimension tables (Orders table – Products table – Suppliers table) this example uses, we can use the dot operator (.) to reference data level by level, such as “ProductID.SupplierID.Region”, which, from Orders table’s PrductID field, references SupplierID field in its associated dimension table and then Region field in the dimension table associated with SupplierID.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/2bkGwqTj
Youtube 👉 https://www.youtube.com/@esProc_SPL