4.3 Wide table
In the previous examples, we use switch to associate dimension tables and get satisfactory performance. But there are still several problems:
1. Once associated, the original foreign key field values are lost, and we need to get the primary key values from the dimension table in order to restore them. This will drag down speed.
In the above example, if we want to get EmployeeID, we need to use EmployeeID. EmployeeID.
2. If no matching record can be found in the dimension table, the foreign key field will become null and completely lose its original value.
3. Sometimes a foreign key consists of multiple fields. In that case, switch function is not applicable.
To solve those issues, SPL offers join function.
4.3.1 Dimension table with single-field primary key
SQL
SELECT Categories.CategoryName,Products.ProductName,sum(Amount) Amount
FROM Orders
LEFT JOIN Products ON Orders.ProductID=Products.ProductID
LEFT JOIN Categories ON Categories.CategoryID=Products.CategoryID
WHERE OrderDate>='2021-01-01' and OrderDate<='2021-12-31'
GROUP BY Categories.CategoryName,Products.ProductName
SPL
A | |
---|---|
1 | > st=date(“2021-01-01”), et=date(“2021-12-31”), start=days@o(st), end=days@o(et) |
2 | >Categories=file(“Categories.btx”).import@b().keys@i(CategoryID) |
3 | =file(“Orders_Time.ctx”).open().cursor@mx(ProductID,Amount;OrderDate>=start && OrderDate<=end) |
4 | =A3.join(ProductID,Products,ProductName,CategoryID; CategoryID,Categories,CategoryName) |
5 | =A4.groups(CategoryName,ProductName;sum(Amount):Amount) |
A4 Use join function to associate Orders records with dimension tables Products and Categories and add ProductName, CategoryID and CategoryName fields to Orders table to form a wide table.
A5 Perform aggregation on the wide table.
The join function adds fields in the original table sequence to store addresses of dimension table records while retaining the original field values. This gets rid of all the above problems. Unlike switch function, join function returns a new table sequence. So, order is important here. For a data table that is both a fact table and a dimension table, first it needs to associate its own dimension table to get a new data table, which will then be associated with the fact table.
4.3.2 Dimension table with multifield primary key
SQL
SELECT Areas.Country,Areas.Region,Areas.City,CityName,Products.ProductName,
sum(Amount) Amount
FROM Orders
LEFT JOIN Products ON Orders.ProductID=Products.ProductID
LEFT JOIN Suppliers ON Suppliers.SupplierID=Products.SupplierID
LEFT JOIN Areas ON Suppliers.Country=Areas.Country
and Suppliers.Region=Areas.Region
and Suppliers.City=Areas.City
WHERE OrderDate>='2021-01-01' and OrderDate<='2021-12-31'
GROUP BY Areas.Country,Areas.Region,Areas.City,CityName,Products.ProductName
SPL
A | |
---|---|
1 | >st=date(“2021-01-01”), et=date(“2021-12-31”), start=days@o(st), end=days@o(et) |
2 | >Areas=file(“Areas.btx”).import@b().keys@i(Country,Region,City) |
3 | =file(“Orders_Time.ctx”).open().cursor@mx(ProductID,Amount;OrderDate>=start && OrderDate<=end) |
4 | =A3.join(ProductID,Products,ProductName,SupplierID.SupplierID; SupplierID,Suppliers,Country,Region,City;Country:Region:City,Areas,CityName) |
5 | =A4.groups(Country,Region,City,ProductName;CityName,sum(Amount):Amount) |
A4 Areas table’s dimension table has a three-field primary key, and join function is used to handle the association with this multi-key dimension table.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL