"The join operation is used to combine records of two or more tables. This essay explains the han .."

pjf RaqForum 22 No.
1 Reply • 829 View • 3 Years ago

Comparison of SQL & SPL: Join Operations (Ⅰ)

The join operation is used to combine records of two or more tables. This essay explains the handling of join operations using SQL and SPL. By exploring solutions and basic principles of the two commonly used programming languages and providing sample programs written in them, we try to find the faster and more efficient way for you. Looking Comparison of SQL & SPL: Join Operations (Ⅰ) for details.

The independent relationship between data of tables is called table association. There are four types of table relationships – one-to-one, many-to-one, one-to-many and many-to-many. Two or more tables can be joined through the associative relationship for performing an associative query on them.

Ⅰ. One-to-one relationship

The one-to-one relationship exists between two tables where a record in a table corresponds to only one record in another table; and vice versa, and usually, that use the same primary key fields.

Suppose there are a student table and student_contacts table. Both use student ID as the primary key. The former stores student names, genders, birthdays, departments, and classes. The latter records contacts of students and their addresses. Each student record corresponds to a piece of contact information, and each contact corresponds to one student.

【Example 1】Based on STUDENT table and STUDENT_CONTACTS table, get names of students and their contacts and contacts’ addresses for students who have the contact information. Below is part of the source data:

STUDENT:

ID	NAME	DEPARTMENTID	CLASSID	GENDER	BIRTHDAY
1	Rebecca	1	1	F	2010/09/08
2	Ashley	1	1	F	2010/10/09
3	Rachel	1	1	F	2011/04/29
4	Emily	1	1	F	2010/11/24
5	Ashley	1	1	F	2011/03/03
…	…	…	…	…	…

STUDENT_CONTACTS:

ID	CONTACTS	ADDRESS
1	Mrs. Moore	124 Guangming North Road
2	Mrs. Wilson	116 Baishi Road
3	Mr. Johnson	No.8, Mingcheng Road, Haidian District
4	Mr. Smith	12 Fuxing Road
5	Mr. Smith	462 Shijingshan Road
…	…	…

SQL solution:

There could be some students who do not have contacts information. To find names and contact information of target students, we use the inner join to select records from STUDENT table where IDs exist in both tables. An inner join is often called a join simply. It will delete all rows that cannot match the other table from the result table. SQL INNER JOIN inner joins the first and the second table and ON statement defines the joining condition. Below is SQL statements:

SELECT

S.NAME,C.CONTACTS,C.ADDRESS

FROM

STUDENT S

INNER JOIN

STUDENT_CONTACTS C

ON S.ID=C.ID

SPL solution:

SPL calls two or more tables having one-to-one relationship between them the homo-dimension tables. One is the homo-dimension table of the other, and vice versa. SPL join() function is used to perform a join operation, an inner join operation, by default.

	A
1	=T("Student.txt")
2	=T("StudentContacts.txt")
3	=join(A1:S,ID;A2:C,ID)
4	=A3.new(S.NAME,C.CONTACTS,C.ADDRESS)

A1: Import Student table from the source file.

A2: Import StudentContacts table from the source file.

A3: Inner joins the two tables through their ID field.

A4: Create a new table sequence made up of fields storing student names, their contacts and addresses.

SPL supports retrieving a data table from the database. Suppose the data comes from database "db"’s "STUDENT" table, A1 in the above SPL script can be rewritten as follows:

	A
1	=connect("db").query("select * from STUDENT")

【Example 2】Based on EMPLOYEE table and MANAGER table, get salaries, including the allowance, of all employees, including managers. Below is part of the source data:

EMPLOYEE:

ID	NAME	BIRTHDAY	STATE	DEPT	SALARY
1	Rebecca	1974/11/20	California	R&D	7000
2	Ashley	1980/07/19	New York	Finance	11000
3	Rachel	1970/12/17	New Mexico	Sales	9000
4	Emily	1985/03/07	Texas	HR	7000
5	Ashley	1975/05/13	Texas	R&D	16000
…	…	…	…	…	…

MANAGER:

ID	ALLOWANCE
18	7000
2	11000
4	7000
6	10000
7	9000
…	…

SQL solution:

Managers are employees, too, but the MANAGER table stores allowance information. To query salaries of all employees, the left join is needed to get records of all employees, including managers. The left join is also called left outer join. It joins two tables based on the left table and lists all records in the left table and records in the right table that match the left table according to the specified condition in the result table. SQL LEFT JOIN is used to left join tables. Below is SQL statements:

SELECT

E.ID,E.NAME,E.SALARY+NVL(M.ALLOWANCE,0) INCOME

FROM EMPLOYEE E

LEFT JOIN

MANAGER M

ON E.ID=M.ID

SPL solution:

SPL join() function is used to perform the join operation. @1 option enables a left join.

	A
1	=T("Employee.csv")
2	=T("Manager.txt")
3	=join@1(A1:E, ID; A2:M, ID)
4	=A3.new(E.ID, E.NAME, E.SALARY+M.ALLOWANCE:INCOME)

A1: Import Employee table from the source file.

A2: Import Manager table from the source file.

A3: Left join the two tables through ID field based on the first table, the employee table.

A4: Create a new table sequence made up of fields storing employee IDs, names, and salaries.

The one-to-one relationship is the simplest among all relationships. Two tables are joined directly by matching their primary keys. Both SQL and SPL can handle this type of associative relationship effectively.

Ⅱ. One-to-many relationship

The one-to-many relationship exists between two tables where a record of one table corresponds to any one or more records in the other table. The table at the “one” end is called the primary table and the one at the “many” end is called the subtable (or subordinate table). Suppose there are orders table and order detail table, each order has the only ID but each order ID may correspond to multiple order detail records. We call the orders table the primary table and the order detail table the subtable.

【Example 3】Based on ORDERS table and ORDER_DETAIL table, calculate the total amount in each order. Below is part of the source data:

ORDERS:

ID	CUSTOMERID	EMPLOYEEID	ORDER_DATE	ARRIVAL_DATE
10248	VINET	5	2012/07/04	2012/08/01
10249	TOMSP	6	2012/07/05	2012/08/16
10250	HANAR	4	2012/07/08	2012/08/05
10251	VICTE	3	2012/07/08	2012/08/05
10252	SUPRD	4	2012/07/09	2012/08/06
…	…	…	…	…

ORDER_DETAIL:

ID	ORDER_NUMBER	PRODUCTID	PRICE	COUNT	DISCOUNT
10814	1	48	102.0	8	0.15
10814	2	48	102.0	8	0.15
10814	3	48	306.0	24	0.15
10814	4	48	102.0	8	0.15
10814	5	48	204.0	16	0.15
…	…	…	…	…	…

SQL solution:

SQL uses JOIN statement to handle the one-to-many relationship. Below is SQL statements:

SELECT

ID, SUM(PRICE*COUNT) AMOUNT

FROM (

SELECT

Orders.ID, Detail.PRICE, Detail.COUNT

FROM ORDERS Orders

INNER JOIN

ORDER_DETAIL Detail

ON Orders.ID=Detail.ID

)

GROUP BY ID

ORDER BY ID

SPL solution:

SPL uses join() function to join the primary table and the subtable through the subtable’s one key field and the primary table’s key when there is only one subtable.

	A
1	=T("Orders.txt")
2	=T("OrderDetail.txt")
3	=join(A1:Orders,ID; A2:Detail,ID)
4	=A3.groups(Orders.ID; sum(Detail.PRICE*Detail.COUNT):AMOUNT)

A1: Import Orders table.

A2: Import OrderDetail table.

A3: Join the two tables through their ID fields.

A4: Group A3’s joining result table and calculate the total amount for each order.

【Example 4】Suppose the ORDERS table has another subtable for recording the payment information. We want to find the orders for which not all payment has been received, that is, those where the accumulative payment is less than the total order amount. Below is part of the source data:

ORDERS:

ID	CUSTOMERID	EMPLOYEEID	ORDER_DATE	ARRIVAL_DATE
10248	VINET	5	2012/07/04	2012/08/01
10249	TOMSP	6	2012/07/05	2012/08/16
10250	HANAR	4	2012/07/08	2012/08/05
10251	VICTE	3	2012/07/08	2012/08/05
10252	SUPRD	4	2012/07/09	2012/08/06
…	…	…	…	…

ORDER_DETAIL:

ID	ORDER_NUMBER	PRODUCTID	PRICE	COUNT	DISCOUNT
10814	1	48	102.0	8	0.15
10814	2	48	102.0	8	0.15
10814	3	48	306.0	24	0.15
10814	4	48	102.0	8	0.15
10814	5	48	204.0	16	0.15
…	…	…	…	…	…

ORDER_PAYMENT:

ID	PAY_DATE	AMOUNT	CHANNEL	INSTALMENTS
10814	2014/01/05	816.0	3	0
10848	2014/01/23	800.25	2	1
10848	2014/01/23	800.25	0	0
10848	2014/01/23	800.25	3	1
10966	2014/03/20	572.0	2	1
…	…	…	…	…

SQL solution:

Just JOINing the three tables is not the right way of doing this because there could be many-to-many relationship between ORDER_DETAIL table and ORDER_PAYMENT table. When the subtables are grouped by order ID, the field will become their unique, actual primary keys. Then the three tables can be joined through the ID field.

SELECT

Orders.ID,Detail.AMOUNT,Payment.PAY_AMOUNT

FROM ORDERS Orders

INNER JOIN

(

SELECT ID, SUM(PRICE*COUNT) AMOUNT

FROM ORDER_DETAIL

GROUP BY ID

) Detail

ON Orders.ID=Detail.ID

INNER JOIN

(

SELECT ID, SUM(AMOUNT) PAY_AMOUNT

FROM ORDER_PAYMENT

GROUP BY ID

) Payment

ON Orders.ID=Payment.ID

WHERE PAY_AMOUNT<Detail.AMOUNT

ORDER BY ID

SPL solution:

When the subtables are grouped by ID, the field will become the real primary key. Now we can treat them as tables with the one-to-one relationship in between (or the homo-dimension tables).

	A
1	=T("Orders.txt")
2	=T("OrderDetail.txt")
3	=T("OrderPayment.txt")
4	=A2.groups(ID; sum(PRICE*COUNT):AMOUNT)
5	=A3.groups(ID; sum(AMOUNT):PAY_AMOUNT)
6	=join(A1:Orders,ID; A4:Detail,ID; A5:Payment,ID)
7	=A6.new(Orders.ID, Detail.AMOUNT, Payment.PAY_AMOUNT)
8	=A7.select(PAY_AMOUNT<AMOUNT)

A1: Import Orders table.

A2: Import OrderDetail table.

A3: Import OrderPayment table.

A4: Group OrderDetail table and calculate the total amount in each order.

A5: Group OrderPayment table and calculate the total payment amount in each order.

A6: The join() function joins Orders table and the grouped & summarized OrderDetail table and OrderPayment table through their ID fields.

A7: Create a new table sequence consisting of fields of order ID, order amount and order payment amount.

A8: Select records where the payment amount is less than the order amount, that is, those that have not received all payment.

Without the support of stepwise coding, SQL will write the whole procedure in a single statement, which is sure to be complicated. SPL displays clear logic by using the step-by-step coding mode. There are two steps for performing a join operation. The first is to group every subtable by one key field (which is the primary table’s primary key) to make them have the same actual primary key as the primary table. The second is to join the two or more tables through their primary keys (or the actual primary keys). In essence, SQL and SPL implement the join operations in the same way, but SPL’s stepwise design makes easy coding.

SPL Official Website 👉 https://www.scudata.com

SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL

SPL Learning Material 👉 https://c.scudata.com

SPL Source Code and Package 👉 https://github.com/SPLWare/esProc

Discord 👉 https://discord.gg/cFTcUNs7