"【Abstract】 Data transposition aims to convert queried data into a specified format to display us .."

Hoo RaqForum 19 No.
1 Reply • 579 View • 3 Years ago

Comparison of SQL & SPL: Complicated Static Transposition

【Abstract】

Data transposition aims to convert queried data into a specified format to display using front-end applications, such as reporting tools. There are row-to-column transposition, column-to-row transposition, and more complicated dynamic transposition. This essay focuses on solutions and basic principles of SQL and SPL, the two commonly used programming languages, in handling transposition scenarios, and tries to find the convenient and efficient way for you through sample programs in SQL and SPL.

Let’s take a look at how SQL and SPL handle complicated static transposition scenarios.

1. Multirow-to-Multirow transposition

【Example 1】Based on the following punch-in data table, generate a new table recording the daily activities for each employee. Each person corresponds to seven records per day:

PER_CODE	IN_OUT	DATE	TIME	TYPE
1110263	1	2013-10-11	09:17:14	In
1110263	6	2013-10-11	11:37:00	Break
1110263	5	2013-10-11	11:38:21	Return
1110263	0	2013-10-11	11:43:21	NULL
1110263	6	2013-10-11	13:21:30	Break
1110263	5	2013-10-11	14:25:58	Return
1110263	2	2013-10-11	18:28:55	Out
…	…	…	…	…

Below is the result of the expected layout:

PER_CODE	DATE	IN	OUT	BREAK	RETURN
1110263	2013-10-11	9:17:14	18:28:55	11:37:00	11:38:21
1110263	2013-10-11	9:17:14	18:28:55	13:21:30	14:25:58
…	…	…	…	…	…

SQL solution:

Here multiple records should be combined to do the calculation and return multiple records according to the specified structure. We sort records every 7 rows by PER_CODE and DATE, get TIME field values from rows numbered [1，7，2，3] in turn from each group and make them first group of values under IN, OUT, BREAK, and RETURN fields, and then retrieve TIME field values from rows numbered [1，7，5，6] in turn from each group and make them the second groups of values under these fields. With databases that do not support PIVOT/UNPIVOT, SQL has the following query:

WITH CTE1 AS (

SELECT

PER_CODE,IN_OUT,"DATE","TIME",TYPE,

MOD(ROWNUM-1,7)+1 GROUP_ORDER

FROM DAILY_TIME

ORDER BY PER_CODE,"DATE","TIME"

)

SELECT * FROM (

SELECT

T_IN.PER_CODE,T_IN."DATE",T_IN."IN",T_OUT."OUT",

T_BREAK.BREAK,T_RETURN."RETURN"

FROM (

SELECT PER_CODE,"DATE","TIME" "IN"

FROM CTE1

WHERE GROUP_ORDER=1

) T_IN

LEFT JOIN (

SELECT PER_CODE,"DATE","TIME" "OUT"

FROM CTE1

WHERE GROUP_ORDER=7

) T_OUT

ON T_IN.PER_CODE=T_OUT.PER_CODE

AND T_IN."DATE"=T_OUT."DATE"

LEFT JOIN (

SELECT PER_CODE,"DATE","TIME" BREAK

FROM CTE1

WHERE GROUP_ORDER=2

) T_BREAK

ON T_IN.PER_CODE=T_BREAK.PER_CODE

AND T_IN."DATE"=T_BREAK."DATE"

LEFT JOIN (

SELECT PER_CODE,"DATE","TIME" "RETURN"

FROM CTE1

WHERE GROUP_ORDER=3

) T_RETURN

ON T_IN.PER_CODE=T_RETURN.PER_CODE

AND T_IN."DATE"=T_RETURN."DATE"

UNION ALL (

SELECT

T_IN.PER_CODE,T_IN."DATE",T_IN."IN",T_OUT."OUT",

T_BREAK.BREAK,T_RETURN."RETURN"

FROM (

SELECT PER_CODE,"DATE","TIME" "IN"

FROM CTE1

WHERE GROUP_ORDER=1

) T_IN

LEFT JOIN (

SELECT PER_CODE,"DATE","TIME" "OUT"

FROM CTE1

WHERE GROUP_ORDER=7

) T_OUT

ON T_IN.PER_CODE=T_OUT.PER_CODE

AND T_IN."DATE"=T_OUT."DATE"

LEFT JOIN (

SELECT PER_CODE,"DATE","TIME" BREAK

FROM CTE1

WHERE GROUP_ORDER=5

) T_BREAK

ON T_IN.PER_CODE=T_BREAK.PER_CODE

AND T_IN."DATE"=T_BREAK."DATE"

LEFT JOIN (

SELECT PER_CODE,"DATE","TIME" "RETURN"

FROM CTE1

WHERE GROUP_ORDER=6

) T_RETURN

ON T_IN.PER_CODE=T_RETURN.PER_CODE

AND T_IN."DATE"=T_RETURN."DATE"

)

ORDER BY PER_CODE,"DATE",BREAK

The query is hard to understand. Take ORACLE 11g as an example, we use PIVOT to simplify the LEFT JOIN part. Then the SQL query is as follows:

WITH CTE1 AS (

SELECT

PER_CODE,IN_OUT,"DATE","TIME",TYPE,

MOD(ROWNUM-1,7)+1 GROUP_ORDER

FROM DAILY_TIME

ORDER BY PER_CODE,"DATE","TIME"

)

SELECT *

FROM (

SELECT *

FROM (

SELECT

PER_CODE,"DATE","TIME",GROUP_ORDER

FROM CTE1

WHERE GROUP_ORDER IN (1,7,2,3)

)

PIVOT(

MIN("TIME") FOR GROUP_ORDER

IN (1 AS "IN",7 AS "OUT",2 AS BREAK,3 AS "RETURN")

)

UNION ALL

(

SELECT *

FROM (

SELECT

PER_CODE,"DATE","TIME",GROUP_ORDER

FROM CTE1

WHERE GROUP_ORDER IN (1,7,5,6)

)

PIVOT(

MIN("TIME") FOR GROUP_ORDER

IN (1 AS "IN",7 AS "OUT",5 AS BREAK,6 AS "RETURN")

)

ORDER BY PER_CODE,"DATE",BREAK

SPL solution:

Though the table structure after transposition is definite, it is still complicated to get this done using A.pivot() function. As an alternative, we create the target data structure instead and then populate data to it.

	A
1	=create(PER_CODE,DATE,IN,OUT,BREAK,RETURN)
2	=T("DailyTime.txt").sort(PER_CODE,DATE,TIME)
3	=A2.group((#-1)\7).(~([1,7,2,3,1,7,5,6]))
4	>A1.record(A3.conj([~.PER_CODE,~.DATE]\|~.(TIME).m([1,2,3,4])\|[~.PER_CODE,~.DATE]\|~.(TIME).m([5,6,7,8])))

A1: Create an empty table according to the specified data structure.

A2: Import the daily punch-in records and sort them by employee code and date.

A3: Group A2’s records every seven rows and, for each group, return records by retrieving values in the order of [1,7,2,3,1,7,5,6].

A4: Concatenate all returned records according to the target order and insert them to A3’s table.

Let’s examine the SPL solution. First we create an empty table according to the target structure, sort original records by grouping them every seven rows, and in each group, get time values from rows [1,7,2,3,1,7,5,6] in turn for the future two records, and finally, populate values to the empty table in sequence. The SQL query is extremely complicated though it uses PIVOT function. PIVOT is not a suitable method for handling this case. Yet SQL has problems in dealing with the task using SPL’s way. SQL’s grouping operation cannot return the post-grouping subsets, which makes it unable to perform further computations on the subset as SPL does. A SQL set is unordered. The language can generate sequence numbers for members in each group through row numbers, but it is inconvenient for it to access members in turn using multiple sequence numbers. SPL, however, can do that.

2. Inter-row calculation during row-to-column transposition

【Example 2】Based on the following user payment detail table, generate a new table storing the payable amount per month for each user in the year 2014. Below is part of the source table:

ID	CUSTOMERID	NAME	UNPAID	ORDER_DATE
112101	C013	CA	12800	2014/02/21
112102	C013	CA	3500	2014/06/15
112103	C013	CA	2600	2015/03/21
…	…	…	…	…

Below is the result of the following layout:

NAME	1	2	3	4	5	6	7	8	9	10	11	12
CA		12800	12800	12800	12800	3500	3500	3500	3500	3500	3500	3500
…

SQL solution:

It is impossible for SQL to implement the general method of creating the target table structure and then populating data to the structure. The language needs to work out a solution for each case. In the last step, PIVOT function can be used to transpose month values into column names. Before that, we need to prepare the original data according to the following format:

NAME	ORDER_MONTH	UNPAID
CA	1
CA	2	12800
CA	3	12800
CA	4	12800
CA	5	12800
CA	6	3500
CA	7	3500
…	…	…

Yet in the original table, not each month has records. What we expect is a table recording the monthly payable amounts for each user in the year 2014. We perform a cross product on the list of unique customers who have the payment records and the months from January to December to generate a table of the target structure. Then we left join this table with the user payment detail table to get the monthly payable amount for each user. Now the table is as follows:

NAME	ORDER_MONTH	UNPAID
CA	1
CA	2	12800
CA	3
CA	4
CA	5
CA	6	3500
CA	7
…	…	…

There is more to do as we expect to populate January’s payable amount, 12,800, to months from March to May, and for customer CA, we want to insert June’s payable amount 3500 to months after July. To do those, we can use the subquery, where we select the previous records containing the payable amounts for a customer if the current month does not have a payable amount and get the one with the latest month. Below is the complete SQL query:

WITH CTE1 AS(

SELECT T1.NAME,T1.ORDER_MONTH,T2.UNPAID

FROM (

SELECT *

FROM (

SELECT DISTINCT CUSTOMERID, NAME

FROM PAYMENT

WHERE EXTRACT (YEAR FROM ORDER_DATE)=2014

)

CROSS JOIN (

SELECT 1 ORDER_MONTH FROM DUAL

UNION ALL SELECT 2 ORDER_MONTH FROM DUAL

UNION ALL SELECT 3 ORDER_MONTH FROM DUAL

UNION ALL SELECT 4 ORDER_MONTH FROM DUAL

UNION ALL SELECT 5 ORDER_MONTH FROM DUAL

UNION ALL SELECT 6 ORDER_MONTH FROM DUAL

UNION ALL SELECT 7 ORDER_MONTH FROM DUAL

UNION ALL SELECT 8 ORDER_MONTH FROM DUAL

UNION ALL SELECT 9 ORDER_MONTH FROM DUAL

UNION ALL SELECT 10 ORDER_MONTH FROM DUAL

UNION ALL SELECT 11 ORDER_MONTH FROM DUAL

UNION ALL SELECT 12 ORDER_MONTH FROM DUAL

)

) T1

LEFT JOIN (

SELECT

CUSTOMERID, NAME,

EXTRACT (MONTH FROM ORDER_DATE) ORDER_MONTH, UNPAID

FROM PAYMENT

WHERE EXTRACT (YEAR FROM ORDER_DATE)=2014

) T2

ON T1.NAME=T2.NAME AND

T1.ORDER_MONTH=T2.ORDER_MONTH

ORDER BY NAME,ORDER_MONTH

CTE2 AS (

SELECT

T1.NAME,T1.ORDER_MONTH,

NVL(T1.UNPAID,

(

SELECT

MIN(UNPAID) KEEP (DENSE_RANK FIRST ORDER BY ORDER_MONTH DESC)

FROM CTE1 T2

WHERE T1.NAME=T2.NAME AND

T2.UNPAID>0 AND

T2.ORDER_MONTH<T1.ORDER_MONTH

)

) UNPAID

FROM CTE1 T1

ORDER BY T1.NAME,T1.ORDER_MONTH

)

SELECT *

FROM CTE2

PIVOT(

MIN(UNPAID) FOR ORDER_MONTH

IN (

1 AS "1",2 AS "2",2 AS "3",

4 AS "4",5 AS "5",6 AS "6",

7 AS "7",8 AS "8",9 AS "9",

10 AS "10",11 AS "11",12 AS "12"

)

The SQL query is too complicated to understand even though we have given a detailed explanation. Now let’s look at how SPL handles the task:

SPL solution:

It is complicated to do this with A.pivot() function. According to the logic in the previous example, we create a target data structure and then populate data to it.

	A
1	=create(NAME,${12.concat@c()})
2	=T("PaymentDetail.txt").select(year(ORDER_DATE)==2014).group(CUSTOMERID)
3	>A2.((m12=12.(null),~.(m12(month(ORDER_DATE))=UNPAID), m12.(~=ifn(~,~[-1])),A1.record(NAME\|m12)))

A1: Create an empty table of the target data structure.

A2: Import the user payment table containing records of the year 2014 and sort it by customer ID.

A3: Loop through each group and for each member in a group calculate the payable amount per month, and then insert the results to A1’s table along with customer names.

The SPL script is concise and, more importantly, clear in logic. For a complicated static transposition task, we can first create the target data structure and then populate data to it. As the SPL set is ordered, it is convenient to perform an inter-row calculation.

3. Present data in horizontal column groups

Presenting data in horizontal column groups is often used for data visualization, where data having same type of attributes is displayed in groups of columns for convenient viewing. Let’s see how SQL and SPL handle this type of transposition through an example.

【Example 3】List names and salaries of employees whose get paid over 10,000 in both sales department and R&D department. Below is part of the employee table:

ID	NAME	SURNAME	STATE	DEPT	SALARY
1	Rebecca	Moore	California	R&D	7000
2	Ashley	Wilson	New York	Finance	11000
3	Rachel	Johnson	New Mexico	Sales	9000
4	Emily	Smith	Texas	HR	7000
5	Ashley	Smith	Texas	R&D	16000
…	…	…	…	…	…

Below is result of the expected layout:

SALESNAME	SALARY	RDNAME	SALARY
Madeline	15000	Ashley	16000
Jacob	12000	Jacob	16000
Andrew	12000	Ryan	13000
…	…	…	…

SQL solution:

First, we perform conditional filtering according to the condition that a record has the department value of sales or R&D and the salary value above 10,000. To display data in groups of columns, we need to get the rows numbers for members in each group and then perform a full join by matching row numbers. Below is the SQL query:

SELECT

T1.NAME SALESNAME, T1.SALARY, T2.NAME RDNAME, T2.SALARY

FROM (

SELECT

NAME,SALARY, ROW_NUMBER()OVER (ORDER BY SALARY DESC) NO

FROM EMPLOYEE

WHERE DEPT='Sales' AND SALARY >10000

) T1

FULL JOIN (

SELECT

NAME,SALARY, ROW_NUMBER()OVER (ORDER BY SALARY DESC) NO

FROM EMPLOYEE

WHERE DEPT='R&D' AND SALARY >10000

) T2

ON T1.NO=T2.NO

SPL solution:

As it handles dynamic transpositions, SPL handles this type of transposition by creating the target structure first and then populating data to it. Below is the SPL script:

	A
1	=T("Employee.csv").select(SALARY >10000).sort@z(SALARY)
2	=A1.select(DEPT:"Sales")
3	=A1.select(DEPT:"R&D")
4	=create('SALESNAME',SALARY,'RDNAME',SALARY)
5	=A4.paste(A2.(NAME),A2.(SALARY),A3.(NAME),A3.(SALARY))

A1: Import Employee data table, select records where salaries are above 10,000, and sort them by salary in descending order.

A2: Get records of sales department.

A3: Get records of R&D department.

A4: Create an empty table of the target data structure.

A5: Use A.paste() function to paste result values to corresponding columns.

We can see that SQL’s static transposition methods PIVOT and UNPIVOT have limited applications and are supported only by certain database products. The SQL query will often be too complicated when it tries to handle certain complex static transposition scenarios. Moreover, SQL lacks a standard method of dealing with them.

SPL provides a flexible and adaptable method for handling various complicated transposition tasks. More importantly, the language has a clear and stable logic. It will first create the target data structure and then populated the calculated result to the table.

In the next essay in the transposition series, we will introduce how SQL and SPL handle dynamic transpositions.

DailyTime.txt

PaymentDetail.txt

Employee.csv

SPL Official Website 👉 https://www.scudata.com

SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL

SPL Learning Material 👉 https://c.scudata.com

SPL Source Code and Package 👉 https://github.com/SPLWare/esProc

Discord 👉 https://discord.gg/cFTcUNs7

Youtube 👉 https://www.youtube.com/@esProc_SPL

esProc