SPL: One-to-One Table Associations

 

A JOIN operation is used to join records of two or more tables. This essay explains SPL’s way of handling JOINs and the underlying principles.

Table association relationships are data dependence relationships between tables. There are four types of table association relationships – one-to-one, many-to-one, one-to-many and many-to-many. We can combine two or more tables through one of those relationships to achieve the multi-table join query.

In a one-to-one association relationship, each record of a table corresponds to one and only one record in the other table and vice versa. The relationship often exists between two tables using the same primary key field. The common primary key creates a one-to-one correspondence between records of the two tables.

One example is between the student table and the student contact table that use the same primary key student ID. The former stores basic student information like name, gender, birthday, department and class. The latter stores students’ contact information like contacts and addresses. Each student record corresponds to a unique contact record, and each contact record matches one student record only.

Example 1 Based on STUDENT table and STUDENT_CONTACTS table, we are trying to find names, contacts and addresses of students who provide their contact information. Below is part of the data:

STUDENT:

ID

NAME

DEPARTMENTID

CLASSID

GENDER

BIRTHDAY

1

Rebecca

1

1

F

2010/09/08

2

Ashley

1

1

F

2010/10/09

3

Rachel

1

1

F

2011/04/29

4

Emily

1

1

F

2010/11/24

5

Ashley

1

1

F

2011/03/03

STUDENT_CONTACTS:

ID

CONTACTS

ADDRESS

1

Mrs. Moore

124 Guangming North Road

2

Mrs. Wilson

116 Baishi Road

3

Mr. Johnson

No.8, Mingcheng Road, Haidian District

4

Mr. Smith

12 Fuxing Road

5

Mr. Smith

462 Shijingshan Road

According to the task description, there are students who don’t leave their contact information. To find the names and contact information of eligible students, we need an inner join that only selects records whose student ID exist in both tables. An inner join is just called the join, which will delete all rows from each of the two tables that cannot match any of the row in the other table.

In SPL, the two or multiple tables that have a one-to-one correspondence relationship are called homo-dimension tables. The SPL join() function is used to perform a join, which is an inner join by default.

The SPL script for achieving the above join query:


A

1

=T("Student.txt")

2

=T("StudentContacts.txt")

3

=join(A1:S,ID;A2:C,ID)

4

=A3.new(S.NAME,C.CONTACTS,C.ADDRESS)

A1: Import STUDENT table from the corresponding file.

A2: Import STUDENT_CONTACTS table from the corresponding file.

A3: Inner joins the two tables on student ID.

A4: Create a new table sequence made up of three fields – student name, contact and address.

SPL can retrieve a data table from a database. Suppose data comes from "STUDENT" table stored in database "db", A1’s code will like this:


A

1

=connect("db").query("select * from STUDENT")

Example 2 Based on EMPLOYEE table and MANAGER table, we are trying to get incomes (including the allowance) of all employees (including the manager). Below is part of the data:

EMPLOYEE:

ID

NAME

BIRTHDAY

STATE

DEPT

SALARY

1

Rebecca

1974/11/20

California

R&D

7000

2

Ashley

1980/07/19

New York

Finance

11000

3

Rachel

1970/12/17

New Mexico

Sales

9000

4

Emily

1985/03/07

Texas

HR

7000

5

Ashley

1975/05/13

Texas

R&D

16000

MANAGER:

ID

ALLOWANCE

18

7000

2

11000

4

7000

6

10000

7

9000

Managers are employees though the MANAGER table stores special information like allowance. To find incomes of all employees, we need a left join that selects all employees, including managers. A left join is also called left outer join, which joins two tables based on the left table. Its result set contains all records in the left table and the records in the right table that have matches in the left table according to the joining field. SPL join() function works with @1 option to perform a left join.

The SPL script for achieving the join query:


A

1

=T("Employee.csv")

2

=T("Manager.txt")

3

=join@1(A1:E, ID; A2:M, ID)

4

=A3.new(E.ID, E.NAME, E.SALARY+M.ALLOWANCE:INCOME)

A1: Import EMPLOYEE table from the corresponding file.

A2: Import MANAGER table from the corresponding file.

A3: Left join the two tables on employee ID based on the first table (EMPLOYEE table).

A4: Create a new table sequence made up of three fields – employee ID, name and total income.

The one-to-one association relationship is the simplest among all association relationships. With such a relationship, two tables are joined directly by their primary keys. Both SQL and SPL can handle join queries with one-to-one association well.

Employee.csv

Manager.txt

Student.txt

StudentContacts.txt