Comparison of SQL & SPL: Complicated Static Transposition

 

Abstract

Data transposition aims to convert queried data into a specified format to display using front-end applications, such as reporting tools. There are row-to-column transposition, column-to-row transposition, and more complicated dynamic transposition. This essay focuses on solutions and basic principles of SQL and SPL, the two commonly used programming languages, in handling transposition scenarios, and tries to find the convenient and efficient way for you through sample programs in SQL and SPL.

 

Let’s take a look at how SQL and SPL handle complicated static transposition scenarios.

 

1. Multirow-to-Multirow transposition

Example 1Based on the following punch-in data table, generate a new table recording the daily activities for each employee. Each person corresponds to seven records per day:

PER_CODE

IN_OUT

DATE

TIME

TYPE

1110263

1

2013-10-11

09:17:14

In

1110263

6

2013-10-11

11:37:00

Break

1110263

5

2013-10-11

11:38:21

Return

1110263

0

2013-10-11

11:43:21

NULL

1110263

6

2013-10-11

13:21:30

Break

1110263

5

2013-10-11

14:25:58

Return

1110263

2

2013-10-11

18:28:55

Out

Below is the result of the expected layout:

PER_CODE

DATE

IN

OUT

BREAK

RETURN

1110263

2013-10-11

9:17:14

18:28:55

11:37:00

11:38:21

1110263

2013-10-11

9:17:14

18:28:55

13:21:30

14:25:58

 

 

SQL solution:

   Here multiple records should be combined to do the calculation and return multiple records according to the specified structure. We sort records every 7 rows by PER_CODE and DATE, get TIME field values from rows numbered [1723] in turn from each group and make them first group of values under IN, OUT, BREAK, and RETURN fields, and then retrieve TIME field values from rows numbered [1756] in turn from each group and make them the second groups of values under these fields. With databases that do not support PIVOT/UNPIVOT, SQL has the following query:

 

   WITH CTE1 AS (

      SELECT

         PER_CODE,IN_OUT,"DATE","TIME",TYPE,

         MOD(ROWNUM-1,7)+1 GROUP_ORDER

      FROM DAILY_TIME

      ORDER BY PER_CODE,"DATE","TIME"

   )

   SELECT * FROM (

      SELECT

         T_IN.PER_CODE,T_IN."DATE",T_IN."IN",T_OUT."OUT",

         T_BREAK.BREAK,T_RETURN."RETURN"

      FROM (

         SELECT PER_CODE,"DATE","TIME" "IN"

         FROM CTE1

         WHERE GROUP_ORDER=1

      ) T_IN

      LEFT JOIN (

         SELECT PER_CODE,"DATE","TIME" "OUT"

         FROM CTE1

         WHERE GROUP_ORDER=7

      ) T_OUT

      ON T_IN.PER_CODE=T_OUT.PER_CODE

         AND T_IN."DATE"=T_OUT."DATE"

      LEFT JOIN (

         SELECT PER_CODE,"DATE","TIME" BREAK

         FROM CTE1

         WHERE GROUP_ORDER=2

      ) T_BREAK

      ON T_IN.PER_CODE=T_BREAK.PER_CODE

         AND T_IN."DATE"=T_BREAK."DATE"

      LEFT JOIN (

         SELECT PER_CODE,"DATE","TIME" "RETURN"

         FROM CTE1

         WHERE GROUP_ORDER=3

      ) T_RETURN

      ON T_IN.PER_CODE=T_RETURN.PER_CODE

         AND T_IN."DATE"=T_RETURN."DATE"

      UNION ALL (

         SELECT

            T_IN.PER_CODE,T_IN."DATE",T_IN."IN",T_OUT."OUT",

            T_BREAK.BREAK,T_RETURN."RETURN"

         FROM (

            SELECT PER_CODE,"DATE","TIME" "IN"

            FROM CTE1

            WHERE GROUP_ORDER=1

      ) T_IN

      LEFT JOIN (

         SELECT PER_CODE,"DATE","TIME" "OUT"

         FROM CTE1

         WHERE GROUP_ORDER=7

      ) T_OUT

      ON T_IN.PER_CODE=T_OUT.PER_CODE

         AND T_IN."DATE"=T_OUT."DATE"

      LEFT JOIN (

         SELECT PER_CODE,"DATE","TIME" BREAK

         FROM CTE1

         WHERE GROUP_ORDER=5

      ) T_BREAK

      ON T_IN.PER_CODE=T_BREAK.PER_CODE

         AND T_IN."DATE"=T_BREAK."DATE"

      LEFT JOIN (

         SELECT PER_CODE,"DATE","TIME" "RETURN"

         FROM CTE1

         WHERE GROUP_ORDER=6

      ) T_RETURN

      ON T_IN.PER_CODE=T_RETURN.PER_CODE

         AND T_IN."DATE"=T_RETURN."DATE"

      )

   )

   ORDER BY PER_CODE,"DATE",BREAK

 

The query is hard to understand. Take ORACLE 11g as an example, we use PIVOT to simplify the LEFT JOIN part. Then the SQL query is as follows:

 

   WITH CTE1 AS (

      SELECT

         PER_CODE,IN_OUT,"DATE","TIME",TYPE,

         MOD(ROWNUM-1,7)+1 GROUP_ORDER

      FROM DAILY_TIME

      ORDER BY PER_CODE,"DATE","TIME"

   )

   SELECT *

   FROM (

      SELECT *

      FROM (

         SELECT

            PER_CODE,"DATE","TIME",GROUP_ORDER

         FROM CTE1

         WHERE GROUP_ORDER IN (1,7,2,3)

      )

      PIVOT(

         MIN("TIME") FOR GROUP_ORDER

         IN (1 AS "IN",7 AS "OUT",2 AS BREAK,3 AS "RETURN")

      )

      UNION ALL

      (

         SELECT *

         FROM (

            SELECT

               PER_CODE,"DATE","TIME",GROUP_ORDER

            FROM CTE1

            WHERE GROUP_ORDER IN (1,7,5,6)

         )

         PIVOT(

            MIN("TIME") FOR GROUP_ORDER

            IN (1 AS "IN",7 AS "OUT",5 AS BREAK,6 AS "RETURN")

         )

      )

   )

   ORDER BY PER_CODE,"DATE",BREAK

 

SPL solution:

   Though the table structure after transposition is definite, it is still complicated to get this done using A.pivot() function. As an alternative, we create the target data structure instead and then populate data to it.


A

1

=create(PER_CODE,DATE,IN,OUT,BREAK,RETURN)

2

=T("DailyTime.txt").sort(PER_CODE,DATE,TIME)

3

=A2.group((#-1)\7).(~([1,7,2,3,1,7,5,6]))

4

>A1.record(A3.conj([~.PER_CODE,~.DATE]|~.(TIME).m([1,2,3,4])|[~.PER_CODE,~.DATE]|~.(TIME).m([5,6,7,8])))

A1: Create an empty table according to the specified data structure.

A2: Import the daily punch-in records and sort them by employee code and date.

A3: Group A2’s records every seven rows and, for each group, return records by retrieving values in the order of [1,7,2,3,1,7,5,6].

A4: Concatenate all returned records according to the target order and insert them to A3’s table.

 

 Let’s examine the SPL solution. First we create an empty table according to the target structure, sort original records by grouping them every seven rows, and in each group, get time values from rows [1,7,2,3,1,7,5,6] in turn for the future two records, and finally, populate values to the empty table in sequence. The SQL query is extremely complicated though it uses PIVOT function. PIVOT is not a suitable method for handling this case. Yet SQL has problems in dealing with the task using SPL’s way. SQL’s grouping operation cannot return the post-grouping subsets, which makes it unable to perform further computations on the subset as SPL does. A SQL set is unordered. The language can generate sequence numbers for members in each group through row numbers, but it is inconvenient for it to access members in turn using multiple sequence numbers. SPL, however, can do that.

2. Inter-row calculation during row-to-column transposition

Example 2Based on the following user payment detail table, generate a new table storing the payable amount per month for each user in the year 2014. Below is part of the source table:

ID

CUSTOMERID

NAME

UNPAID

ORDER_DATE

112101

C013

CA

12800

2014/02/21

112102

C013

CA

3500

2014/06/15

112103

C013

CA

2600

2015/03/21

Below is the result of the following layout:

NAME

1

2

3

4

5

6

7

8

9

10

11

12

CA


12800

12800

12800

12800

3500

3500

3500

3500

3500

3500

3500













 

 

SQL solution:

  It is impossible for SQL to implement the general method of creating the target table structure and then populating data to the structure. The language needs to work out a solution for each case. In the last step, PIVOT function can be used to transpose month values into column names. Before that, we need to prepare the original data according to the following format:

NAME

ORDER_MONTH

UNPAID

CA

1


CA

2

12800

CA

3

12800

CA

4

12800

CA

5

12800

CA

6

3500

CA

7

3500

 Yet in the original table, not each month has records. What we expect is a table recording the monthly payable amounts for each user in the year 2014. We perform a cross product on the list of unique customers who have the payment records and the months from January to December to generate a table of the target structure. Then we left join this table with the user payment detail table to get the monthly payable amount for each user. Now the table is as follows:

NAME

ORDER_MONTH

UNPAID

CA

1


CA

2

12800

CA

3


CA

4


CA

5


CA

6

3500

CA

7


 

There is more to do as we expect to populate January’s payable amount, 12,800, to months from March to May, and for customer CA, we want to insert June’s payable amount 3500 to months after July. To do those, we can use the subquery, where we select the previous records containing the payable amounts for a customer if the current month does not have a payable amount and get the one with the latest month. Below is the complete SQL query:

 

   WITH CTE1 AS(

      SELECT T1.NAME,T1.ORDER_MONTH,T2.UNPAID

      FROM (

         SELECT *

         FROM (

            SELECT DISTINCT CUSTOMERID, NAME

            FROM PAYMENT

            WHERE EXTRACT (YEAR FROM ORDER_DATE)=2014

         )

         CROSS JOIN (

            SELECT 1 ORDER_MONTH FROM DUAL

            UNION ALL SELECT 2 ORDER_MONTH FROM DUAL

            UNION ALL SELECT 3 ORDER_MONTH FROM DUAL

            UNION ALL SELECT 4 ORDER_MONTH FROM DUAL

            UNION ALL SELECT 5 ORDER_MONTH FROM DUAL

            UNION ALL SELECT 6 ORDER_MONTH FROM DUAL

            UNION ALL SELECT 7 ORDER_MONTH FROM DUAL

            UNION ALL SELECT 8 ORDER_MONTH FROM DUAL

            UNION ALL SELECT 9 ORDER_MONTH FROM DUAL

            UNION ALL SELECT 10 ORDER_MONTH FROM DUAL

            UNION ALL SELECT 11 ORDER_MONTH FROM DUAL

            UNION ALL SELECT 12 ORDER_MONTH FROM DUAL

         )

      ) T1

      LEFT JOIN (

         SELECT

            CUSTOMERID, NAME,

            EXTRACT (MONTH FROM ORDER_DATE) ORDER_MONTH, UNPAID

         FROM PAYMENT

         WHERE EXTRACT (YEAR FROM ORDER_DATE)=2014

      ) T2

      ON T1.NAME=T2.NAME AND

         T1.ORDER_MONTH=T2.ORDER_MONTH

      ORDER BY NAME,ORDER_MONTH

   ),

   CTE2 AS (

      SELECT

         T1.NAME,T1.ORDER_MONTH,

         NVL(T1.UNPAID,

            (

               SELECT

                  MIN(UNPAID) KEEP (DENSE_RANK FIRST ORDER BY ORDER_MONTH DESC)

               FROM CTE1 T2

               WHERE T1.NAME=T2.NAME AND

                  T2.UNPAID>0 AND

                  T2.ORDER_MONTH<T1.ORDER_MONTH

            )

         ) UNPAID

      FROM CTE1 T1

      ORDER BY T1.NAME,T1.ORDER_MONTH

   )

   SELECT *

   FROM CTE2

   PIVOT(

      MIN(UNPAID) FOR ORDER_MONTH

      IN (

         1 AS "1",2 AS "2",2 AS "3",

         4 AS "4",5 AS "5",6 AS "6",

         7 AS "7",8 AS "8",9 AS "9",

         10 AS "10",11 AS "11",12 AS "12"

      )

   )

 

The SQL query is too complicated to understand even though we have given a detailed explanation. Now let’s look at how SPL handles the task:

 

SPL solution:

   It is complicated to do this with A.pivot() function. According to the logic in the previous example, we create a target data structure and then populate data to it.


A

1

=create(NAME,${12.concat@c()})

2

=T("PaymentDetail.txt").select(year(ORDER_DATE)==2014).group(CUSTOMERID)

3

>A2.((m12=12.(null),~.(m12(month(ORDER_DATE))=UNPAID), m12.(~=ifn(~,~[-1])),A1.record(NAME|m12)))

A1: Create an empty table of the target data structure.

A2: Import the user payment table containing records of the year 2014 and sort it by customer ID.

A3: Loop through each group and for each member in a group calculate the payable amount per month, and then insert the results to A1’s table along with customer names.

 

   The SPL script is concise and, more importantly, clear in logic. For a complicated static transposition task, we can first create the target data structure and then populate data to it. As the SPL set is ordered, it is convenient to perform an inter-row calculation.

 

3. Present data in horizontal column groups

Presenting data in horizontal column groups is often used for data visualization, where data having same type of attributes is displayed in groups of columns for convenient viewing. Let’s see how SQL and SPL handle this type of transposition through an example.

 

Example 3List names and salaries of employees whose get paid over 10,000 in both sales department and R&D department. Below is part of the employee table:

ID

NAME

SURNAME

STATE

DEPT

SALARY

1

Rebecca

Moore

California

R&D

7000

2

Ashley

Wilson

New   York

Finance

11000

3

Rachel

Johnson

New   Mexico

Sales

9000

4

Emily

Smith

Texas

HR

7000

5

Ashley

Smith

Texas

R&D

16000

Below is result of the expected layout:

SALESNAME

SALARY

RDNAME

SALARY

Madeline

15000

Ashley

16000

Jacob

12000

Jacob

16000

Andrew

12000

Ryan

13000

 

SQL solution:

First, we perform conditional filtering according to the condition that a record has the department value of sales or R&D and the salary value above 10,000. To display data in groups of columns, we need to get the rows numbers for members in each group and then perform a full join by matching row numbers. Below is the SQL query:

 

   SELECT

      T1.NAME SALESNAME, T1.SALARY, T2.NAME RDNAME, T2.SALARY

   FROM (

      SELECT

         NAME,SALARY, ROW_NUMBER()OVER (ORDER BY SALARY DESC) NO

      FROM EMPLOYEE

      WHERE DEPT='Sales' AND SALARY >10000

   ) T1

   FULL JOIN (

      SELECT

         NAME,SALARY, ROW_NUMBER()OVER (ORDER BY SALARY DESC) NO

      FROM EMPLOYEE

      WHERE DEPT='R&D' AND SALARY >10000

   ) T2

   ON T1.NO=T2.NO

 

SPL solution:

   As it handles dynamic transpositions, SPL handles this type of transposition by creating the target structure first and then populating data to it. Below is the SPL script:


A

1

=T("Employee.csv").select(SALARY >10000).sort@z(SALARY)

2

=A1.select(DEPT:"Sales")

3

=A1.select(DEPT:"R&D")

4

=create('SALESNAME',SALARY,'RDNAME',SALARY)

5

=A4.paste(A2.(NAME),A2.(SALARY),A3.(NAME),A3.(SALARY))

A1: Import Employee data table, select records where salaries are above 10,000, and sort them by salary in descending order.

A2: Get records of sales department.

A3: Get records of R&D department.

A4: Create an empty table of the target data structure.

A5: Use A.paste() function to paste result values to corresponding columns.

 

We can see that SQL’s static transposition methods PIVOT and UNPIVOT have limited applications and are supported only by certain database products. The SQL query will often be too complicated when it tries to handle certain complex static transposition scenarios. Moreover, SQL lacks a standard method of dealing with them.

SPL provides a flexible and adaptable method for handling various complicated transposition tasks. More importantly, the language has a clear and stable logic. It will first create the target data structure and then populated the calculated result to the table.

In the next essay in the transposition series, we will introduce how SQL and SPL handle dynamic transpositions.


DailyTime.txt

PaymentDetail.txt

Employee.csv