SPL: Dynamic Transposition
The transposition function is commonly presented in the front-end such as reports, which transposes the queried data into a specific display layout. For example, row-to-column transposition, column-to-row transposition, more complex dynamictransposition, etc. Dynamic row-to-column transposition means the fields generated during transposition can not be specified in advance, but can only be determined dynamically according to the values of the original fields. So let’s take a look at how SPL handles the dynamic transposition.
1. Automatically generate a column according to field values
[Example 1] Based on the employee table, calculate the average salary of each department in different areas. Some of the data is shown below:
ID |
NAME |
SURNAME |
STATE |
DEPT |
SALARY |
1 |
Rebecca |
Moore |
California |
R&D |
7000 |
2 |
Ashley |
Wilson |
New York |
Finance |
11000 |
3 |
Rachel |
Johnson |
New Mexico |
Sales |
9000 |
4 |
Emily |
Smith |
Texas |
HR |
7000 |
5 |
Ashley |
Smith |
Texas |
R&D |
16000 |
… |
… |
… |
… |
… |
… |
The layout of the target table is expected to be as follows:
DEPT |
California |
Colorado |
Florida |
… |
Administration |
9333.333 |
… |
||
Finance |
8000 |
5000 |
10000 |
… |
HR |
10000 |
7000 |
… |
|
… |
… |
… |
… |
… |
The row-to-column transposition needs to get its target fields from the original table. The A.pivot() function in SPL can do such a transposition, which will automatically extract names of the target fields when no target fields are specified.
The SPL script is:
A |
|
1 |
=T("Employee.csv") |
2 |
=A1.groups(STATE,DEPT;avg(SALARY):AVG_SALARY) |
3 |
=A2.pivot(DEPT; STATE, AVG_SALARY) |
A1: Import the employee table.
A2: Group the table by STATE and DEPT to calculate the average salary of each department in each state.
A3: Use the A.pivot() function to transpose rows to columns, and retrieve the names of target fields automatically when they are not specified.
2. Dynamically generate column names through calculation
[Example 2] Based on the income details table, get the income information of each employee with categories generated automatically. Below is part of the source table:
NAME |
SOURCE |
INCOME |
David |
Salary |
8000 |
David |
Bonus |
15000 |
Daniel |
Salary |
9000 |
Andrew |
Shares |
26000 |
Andrew |
Sales |
23000 |
… |
… |
… |
The layout of the target table is expected to be as follows:
NAME |
SOURCE1 |
INCOME1 |
SOURCE2 |
INCOME2 |
… |
Andrew |
Shares |
26000 |
Sales |
23000 |
… |
Daniel |
Salary |
9000 |
… |
||
David |
Salary |
8000 |
Bonus |
15000 |
… |
Robert |
Bonus |
13000 |
… |
||
… |
… |
… |
… |
… |
… |
The target fields are not retrieved from a certain field dynamically but need to be calculated dynamically. Here the A.pivot() function is no longer available. We can generate the target data structure according to the group with the most income categories, and then fill it with data.
The SPL script is:
A |
|
1 |
=T("Income.txt").group(NAME) |
2 |
=A1.max(~.len()) |
3 |
=create(NAME, ${A2.("SOURCE"/~/", INCOME"/~).concat@c()}) |
4 |
>A1.(A3.record(~.NAME | ~.conj([SOURCE, INCOME]))) |
A1: Import the income details table, and group it by NAME.
A2: Calculate the maximum number of members in groups, i.e., the largest number of income categories.
A3: Generate column names dynamically according to the result in A3, and create an empty table.
A4: Loop through each group, and fill in the names, income sources, and amounts with the table created in A3.
3. Dynamic row-to-column transposition through table join
[Example 3] Based on the following ORDERS table, ORDER_DETAIL table, and PRODUCT table, generate a table recording information of products purchased each day by each customer in the year 2014. The relationship between ORDERS table and ORDER_DETAIL table is one to many, where each order corresponds to multiple order detail records. The relationship between ORDER_DETAIL table and PRODUCT table is many to one, where the product ID field of ORDER_DETAIL table points to the ID field of PRODUCT table. Below are parts of the three tables:
ORDERS:
ID |
CUSTOMERID |
EMPLOYEEID |
ORDER_DATE |
ARRIVAL_DATE |
10248 |
VINET |
5 |
2012/07/04 |
2012/08/01 |
10249 |
TOMSP |
6 |
2012/07/05 |
2012/08/16 |
10250 |
HANAR |
4 |
2012/07/08 |
2012/08/05 |
10251 |
VICTE |
3 |
2012/07/08 |
2012/08/05 |
10252 |
SUPRD |
4 |
2012/07/09 |
2012/08/06 |
… |
… |
… |
… |
… |
ORDER_DETAIL:
ID |
ORDER_NUMBER |
PRODUCTID |
PRICE |
COUNT |
DISCOUNT |
10814 |
1 |
48 |
102.0 |
8 |
0.15 |
10814 |
2 |
48 |
102.0 |
8 |
0.15 |
10814 |
3 |
48 |
306.0 |
24 |
0.15 |
10814 |
4 |
48 |
102.0 |
8 |
0.15 |
10814 |
5 |
48 |
204.0 |
16 |
0.15 |
… |
… |
… |
… |
… |
… |
PRODUCT:
ID |
NAME |
SUPPLIERID |
CATEGORY |
1 |
Apple Juice |
2 |
1 |
2 |
Milk |
1 |
1 |
3 |
Tomato sauce |
1 |
2 |
4 |
Salt |
2 |
2 |
5 |
Sesame oil |
2 |
2 |
… |
… |
… |
… |
The layout of the target table is expected to be as follows:
ORDER_DATE |
CUSTOMERID |
PRODUCT1 |
COUNT1 |
PRODUCT2 |
COUNT2 |
… |
2014/1/5 |
VICTE |
Corn flakes |
8 |
|||
2014/1/23 |
CONSH |
Chicken |
3 |
Cake |
9 |
|
… |
… |
… |
… |
… |
… |
… |
We can first join up the three source tables through their associative relationships and perform the subsequent calculations using the same method in the previous two examples.
The SPL script is:
A |
|
1 |
=T("Orders.txt").select(year(ORDER_DATE)==2014) |
2 |
=T("Product.txt") |
3 |
=T("OrderDetail.txt").switch(PRODUCTID,A2:ID).group(ID) |
4 |
=join(A1:ORDERS,ID;A3:DETAIL,ID) |
5 |
=create(DATE,CUSTOMERID,${A4.max(DETAIL.len()).("PRODUCT"/~/","/"COUNT"/~).concat@c()}) |
6 |
>A4.run(A5.record([ORDERS.ORDER_DATE,ORDERS.CUSTOMERID]|DETAIL.([PRODUCTID.NAME,COUNT]).conj())) |
A1: Import the Orders table and select records of the year 2014.
A2: Import the Product table.
A3: Import the OrderDetail table, join it with the Product table through the PRODUCTID filed, convert the PRODUCTID filed values to corresponding product records, and then group the joining result by order ID.
A4: Join up Order table with OrderDetail table through order ID.
A5: Generate the target data structure according to the group with the most OrderDetail records, and create an empty table.
A6: Loop through the order data and fill them in the table created in A5 in turn.
In fact, this case is the same as the previous ones essentially with only one more step, that is, joining up the three tables through their associative relationships. The subsequent operations are the same, creating the target data structure as needed and filling it with data.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL
Chinese version