Comparison of SQL & SPL: Static Transposition

 

Abstract

Data transposition aims to convert queried data into a specified format to display using front-end applications, such as reporting tools. There are row-to-column transposition, column-to-row transposition, and more complicated dynamic transposition. This essay focuses on solutions and basic principles of SQL and SPL, the two commonly used programming languages, in handling transposition scenarios, and tries to find the convenient and efficient way for you through sample programs in SQL and SPL.

 

A static transposition can define the data structure after data is transposed beforehand and the structure does not change accordingly as data is changed.

 

1. Row-to-column transposition

As the name shows, row-to-column transposition aims to convert row values into column names to transfer rows to columns. In practice, PIVOT is often used after the grouping and aggregation, which gets unique values from the row values under the to-be-transposed columns and then arrange the row values into column names. The role of PIVOT is to group and put the aggregates in a specific column into multiple columns for an intuitive representation.

 

Example 1Based on the following scores table, find the highest score of each subject in each class and present the results in columns. Below is part of the source table:

CLASS

STUDENTID

SUBJECT

SCORE

1

1

English

84

1

1

Math

77

1

1

PE

69

1

2

English

81

1

2

Math

80

Below is the result of expected layout:

CLASS

MAX_MATH

MAX_ENGLISH

MAX_PE

1

97

96

97

2

97

96

97

 

SQL solution:

SQL PIVOT is used to achieve row-to-column transposition and column-to-row transposition. But only the relatively new versions of certain database products support this method. ORACLE, for instance, supports it from the 11g version, and some databases, such as MYSQL, still uses the subquery to do the grouping and aggregation and then a left join to achieve the transposition and have not given any support to it. In this case, we use ORACLE 11g to write the SQL query:

 

   SELECT *

   FROM (

      SELECT

         CLASS, SUBJECT, SCORE

      FROM SCORES

   )

   PIVOT (

      MAX(SCORE) FOR SUBJECT

      IN (

         'Math' AS MAX_MATH,

         'English' AS MAX_ENGLISH,

         'PE' AS MAX_PE

      )

   )

 

SPL solution:

   SPL offers A.pivot() function to perform transposition, row-to-column by default.


A

1

=T("Scores.csv")

2

=A1.groups(CLASS,SUBJECT; max(SCORE):MAX_SCORE)

3

=A2.pivot(CLASS; SUBJECT, MAX_SCORE; "Math":"MAX_MATH", "English":"MAX_ENGLISH", "PE":"MAX_PE")

A1: Import Scores table from the source file.

A2: Group A1’s table by class and subject and calculate the highest score of each subject in each class.

A3: A.pivot() function transposes the row-wise highest scores into column names.

 

SPL supports retrieving a data table from the database, too. A1 in the above script can be modified as:


A

1

=connect("db").query("SELECT * FROM SCORES")

 

2. Column-to-row transposition

Contrary to row-to-column transposition, the column-to-row transposition converts each field into a new row, where the new field value comes from the original field name or alias, and transforms the original field values into field values of a new field.

 

Example 2Based on the following Olympic medal table, generate a new table recording information for each type of medal. Below is part of the source table:

Game

Nation

Gold

Silver

Copper

30

USA

46

29

29

30

China

38

27

23

30

UK

29

17

19

30

Russia

24

26

32

30

Korea

13

8

7

Below is the result of expected layout:

GAME

NATION

MEDAL_TYPE

MEDALS

30

USA

GOLD

46

30

USA

SILVER

29

30

USA

COPPER

29

30

China

GOLD

38

30

China

SILVER

27

 

 

SQL solution:

   SPL uses UNPIVOT function to perform a column-to-row transposition:

 

   SELECT *

   FROM OLYMPIC

   UNPIVOT (

      MEDALS FOR MEDAL_TYPE IN (

          GOLD,SILVER,COPPER

      )

   )

 

SPL solution:

   SPL A.pivot() function works with @r option to perform a column-to-row transposition:


A

1

=T("Olympic.txt")

2

=A1.pivot@r(GAME,NATION; MEDAL_TYPE, MEDALS; GOLD, SILVER, COPPER)

A1: Import Olympic medal table.

A2: A.pivot@r() function transposes columns GOLD, SILVER and COPPER into new rows.

 

3. Scenarios containing both types of transpositions

 

Example 3Based on the following channel-based sales table, generate a new table storing information by date. Below is part of the source table:

YEAR

MONTH

ONLINE

STORE

2020

1

2440

3746.2

2020

2

1863.4

448.0

2020

3

1813.0

624.8

2020

4

670.8

2464.8

2020

5

3730.0

724.5

Below is the result of expected layout:

CATEGORY

1

2

3

ONLINE

2440

1863.4

1813.0

STORE

3746.2

448.0

624.8

 

 SQL solution:

    We need to perform both row-to-column transposition and column-to-row transposition to get this done. First, we perform column-to-row transposition to transform channel types into values under CATEGORY field:

YEAR

MONTH

CATEGORY

AMOUNT

2020

1

ONLINE

2440

2020

1

STORE

3746.2

2020

2

ONLINE

1863.4

2020

2

STORE

448.0

 

Then we perform row-to-column transposition to convert MONTH values into column names. The complete SQL query is as follows:

 

   SELECT *

   FROM (

      SELECT *

      FROM MONTH_SALES

      UNPIVOT (

         AMOUNT FOR CATEGORY IN (

            "ONLINE",STORE

         )

      )

      WHERE YEAR=2020

   )

   PIVOT (

      MAX(AMOUNT) FOR MONTH

      IN (

         1 AS "1",2 AS "2",2 AS "3",

         4 AS "4",5 AS "5",6 AS "6",

         7 AS "7",8 AS "8",9 AS "9",

         10 AS "10",11 AS "11",12 AS "12"

      )

   )

 

SPL does not support using a non-constant expression as PIVOT/UNPIVOT value, so all months need to be enumerated for the row-to-column transposition.

 

SPL solution:

   According to the natural logic, SPL handles the task using A.pivot@r() and A.pivot() respectively for column-to-row transposition and row-to-column transposition:


A

1

=T("MonthSales.csv").select(YEAR:2020)

2

=A1.pivot@r(YEAR,MONTH; CATEGORY, AMOUNT)

3

=A2.pivot(CATEGORY; MONTH, AMOUNT)

A1: Import MonthSales table and select records of the year 2020.

A2: A.pivot@r() performs column-to-row transposition to convert channel types into values of CATEGORY field.

A3: A.pivot () performs row-to-column transposition to transform MONTH field values into column names.

  

Both SQL and SPL handle simple transposition scenarios well. The issue is that the real-world situations cannot always dealt with using the mode of grouping & aggregation plus PIVOT. In the subsequent essays in our transposition series, we will introduce how the two languages handle complicated static transpositions and dynamic transpositions.

 


Scores.csv

Olympic.txt

MonthSales.csv