Comparison of SQL & SPL: Join Operations (Ⅰ)

 

The join operation is used to combine records of two or more tables. This essay explains the handling of join operations using SQL and SPL. By exploring solutions and basic principles of the two commonly used programming languages and providing sample programs written in them, we try to find the faster and more efficient way for you. Looking Comparison of SQL & SPL: Join Operations (Ⅰ) for details.

 

The independent relationship between data of tables is called table association. There are four types of table relationships – one-to-one, many-to-one, one-to-many and many-to-many. Two or more tables can be joined through the associative relationship for performing an associative query on them.

 

Ⅰ. One-to-one relationship

The one-to-one relationship exists between two tables where a record in a table corresponds to only one record in another table; and vice versa, and usually, that use the same primary key fields.

Suppose there are a student table and student_contacts table. Both use student ID as the primary key. The former stores student names, genders, birthdays, departments, and classes. The latter records contacts of students and their addresses. Each student record corresponds to a piece of contact information, and each contact corresponds to one student.

 

Example 1Based on STUDENT table and STUDENT_CONTACTS table, get names of students and their contacts and contacts’ addresses for students who have the contact information. Below is part of the source data:

STUDENT:

ID

NAME

DEPARTMENTID

CLASSID

GENDER

BIRTHDAY

1

Rebecca

1

1

F

2010/09/08

2

Ashley

1

1

F

2010/10/09

3

Rachel

1

1

F

2011/04/29

4

Emily

1

1

F

2010/11/24

5

Ashley

1

1

F

2011/03/03

STUDENT_CONTACTS:

ID

CONTACTS

ADDRESS

1

Mrs. Moore

124 Guangming North Road

2

Mrs. Wilson

116 Baishi Road

3

Mr. Johnson

No.8, Mingcheng Road, Haidian   District

4

Mr. Smith

12 Fuxing Road

5

Mr. Smith

462 Shijingshan Road

 

SQL solution:

There could be some students who do not have contacts information. To find names and contact information of target students, we use the inner join to select records from STUDENT table where IDs exist in both tables. An inner join is often called a join simply. It will delete all rows that cannot match the other table from the result table. SQL INNER JOIN inner joins the first and the second table and ON statement defines the joining condition. Below is SQL statements:

 

   SELECT

      S.NAME,C.CONTACTS,C.ADDRESS

   FROM

      STUDENT S

   INNER JOIN

   STUDENT_CONTACTS C

   ON S.ID=C.ID

 

SPL solution:

SPL calls two or more tables having one-to-one relationship between them the homo-dimension tables. One is the homo-dimension table of the other, and vice versa. SPL join() function is used to perform a join operation, an inner join operation, by default.


A

1

=T("Student.txt")

2

=T("StudentContacts.txt")

3

=join(A1:S,ID;A2:C,ID)

4

=A3.new(S.NAME,C.CONTACTS,C.ADDRESS)

A1: Import Student table from the source file.

A2: Import StudentContacts table from the source file.

A3: Inner joins the two tables through their ID field.

A4: Create a new table sequence made up of fields storing student names, their contacts and addresses.

 

SPL supports retrieving a data table from the database. Suppose the data comes from database "db"’s "STUDENT" table, A1 in the above SPL script can be rewritten as follows:


A

1

=connect("db").query("select * from STUDENT")

 

 

Example 2Based on EMPLOYEE table and MANAGER table, get salaries, including the allowance, of all employees, including managers. Below is part of the source data:

EMPLOYEE:

ID

NAME

BIRTHDAY

STATE

DEPT

SALARY

1

Rebecca

1974/11/20

California

R&D

7000

2

Ashley

1980/07/19

New York

Finance

11000

3

Rachel

1970/12/17

New Mexico

Sales

9000

4

Emily

1985/03/07

Texas

HR

7000

5

Ashley

1975/05/13

Texas

R&D

16000

MANAGER:

ID

ALLOWANCE

18

7000

2

11000

4

7000

6

10000

7

9000

 

 SQL solution:

Managers are employees, too, but the MANAGER table stores allowance information. To query salaries of all employees, the left join is needed to get records of all employees, including managers. The left join is also called left outer join. It joins two tables based on the left table and lists all records in the left table and records in the right table that match the left table according to the specified condition in the result table. SQL LEFT JOIN is used to left join tables. Below is SQL statements:

 

   SELECT

      E.ID,E.NAME,E.SALARY+NVL(M.ALLOWANCE,0) INCOME

   FROM EMPLOYEE E

   LEFT JOIN

   MANAGER M

   ON E.ID=M.ID

 

SPL solution:

SPL join() function is used to perform the join operation. @1 option enables a left join.


A

1

=T("Employee.csv")

2

=T("Manager.txt")

3

=join@1(A1:E, ID; A2:M, ID)

4

=A3.new(E.ID, E.NAME, E.SALARY+M.ALLOWANCE:INCOME)

A1: Import Employee table from the source file.

A2: Import Manager table from the source file.

A3: Left join the two tables through ID field based on the first table, the employee table.

A4: Create a new table sequence made up of fields storing employee IDs, names, and salaries.

 

The one-to-one relationship is the simplest among all relationships. Two tables are joined directly by matching their primary keys. Both SQL and SPL can handle this type of associative relationship effectively.

 

Ⅱ. One-to-many relationship

The one-to-many relationship exists between two tables where a record of one table corresponds to any one or more records in the other table. The table at the “one” end is called the primary table and the one at the “many” end is called the subtable (or subordinate table). Suppose there are orders table and order detail table, each order has the only ID but each order ID may correspond to multiple order detail records. We call the orders table the primary table and the order detail table the subtable.

 

Example 3Based on ORDERS table and ORDER_DETAIL table, calculate the total amount in each order. Below is part of the source data:

ORDERS:

ID

CUSTOMERID

EMPLOYEEID

ORDER_DATE

ARRIVAL_DATE

10248

VINET

5

2012/07/04

2012/08/01

10249

TOMSP

6

2012/07/05

2012/08/16

10250

HANAR

4

2012/07/08

2012/08/05

10251

VICTE

3

2012/07/08

2012/08/05

10252

SUPRD

4

2012/07/09

2012/08/06

ORDER_DETAIL:

ID

ORDER_NUMBER

PRODUCTID

PRICE

COUNT

DISCOUNT

10814

1

48

102.0

8

0.15

10814

2

48

102.0

8

0.15

10814

3

48

306.0

24

0.15

10814

4

48

102.0

8

0.15

10814

5

48

204.0

16

0.15

 

SQL solution:

SQL uses JOIN statement to handle the one-to-many relationship. Below is SQL statements:

 

   SELECT

      ID, SUM(PRICE*COUNT) AMOUNT

   FROM (

      SELECT

         Orders.ID, Detail.PRICE, Detail.COUNT

      FROM ORDERS Orders

      INNER JOIN

      ORDER_DETAIL Detail

      ON Orders.ID=Detail.ID

   )

   GROUP BY ID

   ORDER BY ID

 

SPL solution:

SPL uses join() function to join the primary table and the subtable through the subtable’s one key field and the primary table’s key when there is only one subtable.


A

1

=T("Orders.txt")

2

=T("OrderDetail.txt")

3

=join(A1:Orders,ID; A2:Detail,ID)

4

=A3.groups(Orders.ID; sum(Detail.PRICE*Detail.COUNT):AMOUNT)

A1: Import Orders table.

A2: Import OrderDetail table.

A3: Join the two tables through their ID fields.

A4: Group A3’s joining result table and calculate the total amount for each order.

 

Example 4Suppose the ORDERS table has another subtable for recording the payment information. We want to find the orders for which not all payment has been received, that is, those where the accumulative payment is less than the total order amount. Below is part of the source data:

ORDERS:

ID

CUSTOMERID

EMPLOYEEID

ORDER_DATE

ARRIVAL_DATE

10248

VINET

5

2012/07/04

2012/08/01

10249

TOMSP

6

2012/07/05

2012/08/16

10250

HANAR

4

2012/07/08

2012/08/05

10251

VICTE

3

2012/07/08

2012/08/05

10252

SUPRD

4

2012/07/09

2012/08/06

ORDER_DETAIL:

ID

ORDER_NUMBER

PRODUCTID

PRICE

COUNT

DISCOUNT

10814

1

48

102.0

8

0.15

10814

2

48

102.0

8

0.15

10814

3

48

306.0

24

0.15

10814

4

48

102.0

8

0.15

10814

5

48

204.0

16

0.15

ORDER_PAYMENT:

ID

PAY_DATE

AMOUNT

CHANNEL

INSTALMENTS

10814

2014/01/05

816.0

3

0

10848

2014/01/23

800.25

2

1

10848

2014/01/23

800.25

0

0

10848

2014/01/23

800.25

3

1

10966

2014/03/20

572.0

2

1

 

SQL solution:

Just JOINing the three tables is not the right way of doing this because there could be many-to-many relationship between ORDER_DETAIL table and ORDER_PAYMENT table. When the subtables are grouped by order ID, the field will become their unique, actual primary keys. Then the three tables can be joined through the ID field.

 

   SELECT

      Orders.ID,Detail.AMOUNT,Payment.PAY_AMOUNT

   FROM ORDERS Orders

   INNER JOIN

   (

      SELECT ID, SUM(PRICE*COUNT) AMOUNT

      FROM ORDER_DETAIL

      GROUP BY ID

   ) Detail

   ON Orders.ID=Detail.ID

   INNER JOIN

   (

      SELECT ID, SUM(AMOUNT) PAY_AMOUNT

      FROM ORDER_PAYMENT

      GROUP BY ID

   ) Payment

   ON Orders.ID=Payment.ID

   WHERE PAY_AMOUNT<Detail.AMOUNT

   ORDER BY ID

 

SPL solution:

When the subtables are grouped by ID, the field will become the real primary key. Now we can treat them as tables with the one-to-one relationship in between (or the homo-dimension tables).


A

1

=T("Orders.txt")

2

=T("OrderDetail.txt")

3

=T("OrderPayment.txt")

4

=A2.groups(ID; sum(PRICE*COUNT):AMOUNT)

5

=A3.groups(ID; sum(AMOUNT):PAY_AMOUNT)

6

=join(A1:Orders,ID; A4:Detail,ID; A5:Payment,ID)

7

=A6.new(Orders.ID, Detail.AMOUNT, Payment.PAY_AMOUNT)

8

=A7.select(PAY_AMOUNT<AMOUNT)

A1: Import Orders table.

A2: Import OrderDetail table.

A3: Import OrderPayment table.

A4: Group OrderDetail table and calculate the total amount in each order.

A5: Group OrderPayment table and calculate the total payment amount in each order.

A6: The join() function joins Orders table and the grouped & summarized OrderDetail table and OrderPayment table through their ID fields.

A7: Create a new table sequence consisting of fields of order ID, order amount and order payment amount.

A8: Select records where the payment amount is less than the order amount, that is, those that have not received all payment.

 

Without the support of stepwise coding, SQL will write the whole procedure in a single statement, which is sure to be complicated. SPL displays clear logic by using the step-by-step coding mode. There are two steps for performing a join operation. The first is to group every subtable by one key field (which is the primary table’s primary key) to make them have the same actual primary key as the primary table. The second is to join the two or more tables through their primary keys (or the actual primary keys). In essence, SQL and SPL implement the join operations in the same way, but SPL’s stepwise design makes easy coding.

Student.txt

StudentContacts.txt

Employee.csv

Manager.txt

Orders.txt

OrderDetail.txt

OrderPayment.txt