"https://c.scudata.com/article/1733286898175 8.1.1 Ordinary foreign key association Part of the d .."

mars RaqForum 25 No.
69 View • 1 Months ago

8.1 Foreign key association

example codes for comparing spl sql and python(31)

8.1.1 Ordinary foreign key association

Part of the data in the sales record table, city information table and product information table are given as below:

Sales record table (fact table):

recordid	product	product_city	sale_city	amount	…
sr100001	p1006	c105	c103	603	…
sr100002	p1005	c105	c102	1230	…
sr100003	p1003	c102	c102	885	…
…	…	…	…	…	…

City information table (dimension table):

cityid	name	state	…
c101	New York	New York	…
c102	Miami	Florida	…
c103	Los Angeles	California	…
…	…	…	…

Product information table (dimension table):

productid	pclass	…
p1001	A	…
p1002	A	…
p1003	B	…
…	…	…

Perform the following calculations based on the above data:

1. One fact table and one dimension table - aggregate the sales of each product.

2. One fact table and multiple dimension tables - count the sales of each product of each state.

3. Reuse dimension table

1) Find the sales record of product sold and manufactured in the same state.

2) Calculate the sales of product sold in California.

3) Calculate the total sales of each state as the place of production.

4. Multi-layer dimension table – find the sales records of product manufactured and sold in the same state.

Sometimes, the city information of a certain product is not in the sales record table but in the product information table. In this case, the association is a multi-layer dimension table association.

The modified sales record table and product information table are as follows:

Sales record table 2:

recordid	product	sale_city	amount	…
sr100001	p1003	c104	380	…
sr100002	p1005	c103	400	…
sr100003	p1003	c104	626	…
…	…	…	…	…

Product information table 2:

productid	product_city	…
p1001	c104	…
p1002	c103	…
p1003	c102	…
…	…	…

SPL

	A	B
1	=file(“SaleRecord.csv”).import@tc()
2	=file(“Product.csv”).import@tc()
3	=A1.switch(product,A2:productid)	/1. One fact table and one dimension table
4	=A3.groups(product.pclass;sum(amount):amount)	/1. One fact table and one dimension table
5	=file(“City.csv”).import@tc()
6	=A1.switch(product,A2:productid;sale_city,A5:cityid)	/2. One fact table and multiple dimension tables
7	=A6.groups(sale_city.state,product.pclass;sum(amount):amount)	/2. One fact table and multiple dimension tables
8	=A1.switch(sale_city,A5:cityid;product_city,A5:cityid)	/3. Reuse dimension table
9	=A8.select(sale_city.state==product_city.state).(recordid)	/3. Reuse dimension table
10	=A8.select(sale_city.state==“California”).sum(amount)	/The sales of product sold in California
11	=A8.groups(product_city.state;sum(amount):amount)	/The total sales of each state as the place of production.
12	=file(“SaleRecord2.csv”).import@tc()
13	=file(“Product2.csv”).import@tc()
14	=A13.switch(product_city,A5:cityid)	/4. Multi-layer dimension table
15	=A12.switch(sale_city,A5:cityid;product,A14:productid)	/4. Multi-layer dimension table
16	=A15.select(sale_city.state==product.product_city.state).(recordid)	/4. Multi-layer dimension table

The switch() function is to convert the foreign key field to the corresponding dimension table record. This function only references the dimension table record and does not copy the data and, it can parse multiple association relations at the same time and reuse the relations.

Notes: The above four tasks are independent of each other. Although the codes are written together, when executing, the code for the other three tasks should be commented out to execut expected code, otherwise the field that has been switched cannot be switched again.

The results of A3:

It should be specially noted that the associative relation in SPL is to directly reference the associative record, which allows all associative relations to be established in one go when loading data. In addition, SPL saves the associated results in a pointer manner, making it convenient to use this associative relation for subsequent calculations, which is commonly referred to as pre-association. For example, A8 is the pre-associated data, which can be directly referenced in subsequent A9, A10, and A11.

SQL

1. One fact table and one dimension table - aggregate the sales of each product

SELECT p.pclass AS pclass, SUM(s.amount) AS amount
FROM SaleRecord s
JOIN product p ON s.product = p.productid
GROUP BY p.pclass;

2. One fact table and multiple dimension tables - count the sales of each product of each state

SELECT c.state AS state, p.pclass AS pclass, SUM(s.amount) AS amount
FROM SaleRecord s
JOIN City c ON s.sale_city = c.cityid
JOIN Product p ON s.product = p.productid
GROUP BY c.state, p.pclass;

3. Reuse dimension table 1 - find the sales record of product sold & manufactured in the same state

SELECT s.recordid
FROM SaleRecord s
JOIN City c1 ON s.product_city = c1.cityid
JOIN City c2 ON s.sale_city = c2.cityid
JOIN Product p ON s.product = p.productid
WHERE c1.state = c2.state;

4. Reuse dimension table 2 - calculate the sales of product sold in California

SELECT SUM(amount) AS amount
FROM SaleRecord s
JOIN City c ON s.sale_city=c.cityid
WHERE c.state = 'California';

5. Reuse dimension table 3 - calculate the total sales of each state as the place of production.

SELECT c.state AS state, SUM(s.amount) AS amount
FROM SaleRecord s
JOIN City c ON s.product_city = c.cityid
GROUP BY c.state;

6. Multi-layer dimension table - find the sales records of product manufactured & sold in the same state.

SELECT s.recordid
FROM SaleRecord2 s
JOIN Product2 p ON s.product = p.productid
JOIN city c ON s.sale_city=c.cityid
JOIN city c2 ON p.product_city=c2.cityid
WHERE c.state=c2.state
ORDER BY s.recordid;

Python

record1=pd.read_csv("../SaleRecord.csv")
product1=pd.read_csv("../Product.csv")
r_pt=pd.merge(record1,product1,left_on="product",right_on="productid") #1.One fact table and one dimension table
pclass_sale=r_pt.groupby('pclass',as_index=False).amount.sum()
ct1=pd.read_csv("../City.csv")
r_ct=pd.merge(record1,ct1,left_on="sale_city",right_on="cityid") #2.One fact table and multiple dimension tables
r_ct_pdt=pd.merge(r_ct,product1,left_on="product",right_on="productid") #2.One fact table and multiple dimension tables
ct_pdt_sale=r_ct_pdt.groupby(['state','pclass'],as_index=False).amount.sum()
r_ct2=pd.merge(record1,ct1,left_on="sale_city",right_on="cityid") #3. Reuse dimension table
r_ct_ct=pd.merge(r_ct2,ct1,left_on="product_city",right_on="cityid",suffixes=('_s', '_p')) /#3. Reuse dimension table
r_ct_p_ct= r_ct_ct[r_ct_ct['state_s']==r_ct_ct['state_p']].recordid.values.tolist()
#ThesalesofproductsoldinCalifornia
hlj_sale_amount=r_ct_ct[r_ct_ct['state_s']=="California"].amount.sum()
#Thetotalsalesofeachstateastheplaceofproduction
state_product_amount=r_ct_ct.groupby(['state_p']).amount.sum()
record2=pd.read_csv("../SaleRecord2.csv")
product2=pd.read_csv("../Product2.csv")
pdt_ct=pd.merge(product2,ct1,left_on="product_city",right_on="cityid")
r_pdt_ct=pd.merge(record2,pdt_ct,left_on="product",right_on="productid") #4. Multi-layer dimension table
r_pdt_ct_ct=pd.merge(r_pdt_ct,ct1,left_on="sale_city",right_on="cityid",suffixes=('_s', '_p')) #4. Multi-layer dimension table
r_ct_p_ct2=r_pdt_ct_ct[r_pdt_ct_ct['state_s']==r_pdt_ct_ct['state_p']].recordid.values.tolist()

Associating in Python will generate a wide table, which completely copy the data of two tables. For example, the result of r-pt is as follows:

The associative relation created in Python can also be considered as pre-association. For example, the r_ct_ct in the reuse of dimension table can be regarded as the pre-associated result, except that Python creates a large wide table into which all the associative fields are put.

8.1.2 Cycle association

There are two tables: employee information table and department information table:

Employee information table:

empid	name	dept	state	partner	…
1	FORD	6	New York	12	…
2	SCOTT	2	Florida	12	…
3	JAMES	7	California	11	…
…	…	…	…	…	…

Department information table:

deptid	name	manager	…
1	Administration	20	…
2	Finance	2	…
3	HR	162	…
…	…	…	…

Perform the following calculations based on the data above:

1. Self-association - list the names of all employees and their partners

2. Cycle association - find the Florida employee of Florida manager

SPL

	A	B
1	=file(“Employee_1.csv”).import@tc()
2	=A1.switch(partner,A1:empid)	/1. Self-association
3	=A2.new(name,partner.name:name_p)
4	=file(“Department2.csv”).import@tc()
5	=A4.switch(manager,A1:empid)
6	=A1.switch(dept,A5:deptid)	/2. Cycle association
7	=A6.select(state==“Florida”&&dept.manager.state==“Florida”).(name)	/2. Cycle association

SQL

1. Self-association - list the names of all employees and their partners

SELECT e.name AS name, p.name AS partner
FROM Employee_1 e
LEFT JOIN Employee_1 p ON e.partner = p.empid;

2. Cycle association - find the Florida employee of Florida manager

SELECT e.name AS employee_name
FROM Employee_1 e
JOIN Department2 d ON e.dept = d.deptid
JOIN Employee_1 m ON d.manager = m.empid
WHERE e.state = 'Florida' AND m.state = 'Florida';

Python

emp=pd.read_csv("../Employee_1.csv")
emp_s=pd.merge(emp,emp,left_on="partner",right_on="empid",suffixes=('', '_p'),how="left")
#1. Self-association
emp_s_name=emp_s[['name','name_p']]
dept2=pd.read_csv("../Department2.csv")
d_emp=pd.merge(dept2,emp,left_on="manager",right_on="empid")
#2. Cycle association
emp_d_emp=pd.merge(emp,d_emp,left_on="dept",right_on="deptid",suffixes=('', '_m'))
#2. Cycle association
Florida_emp_m=emp_d_emp[(emp_d_emp['state']=="Florida") & (emp_d_emp['state_m']=="Florida")].name.tolist()

8.2 Primary key association
Example codes for comparing SPL, SQL, and Python

SPL Official Website 👉 https://www.scudata.com

SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL

SPL Learning Material 👉 https://c.scudata.com

SPL Source Code and Package 👉 https://github.com/SPLWare/esProc

Discord 👉 https://discord.gg/2bkGwqTj

Youtube 👉 https://www.youtube.com/@esProc_SPL

example codes for comparing spl sql and python(31)

eBook