8.3 Mixed association

 

8.2 Primary key association


Based on the order table, order detail table, product information table, employee information table, travel information table, client information table, and city information table, calculate the sales of California products sold in each state by post-90s sales representative who travels for more than 10 days.

The associative relations are shown below:

SPL

A
1 =file(“Employee4.csv”).import@tc()
2 =file(“Travel4.csv”).import@tc()
3 =A2.join(empid,A1:empid,birthday)
4 =A3.select((y=year(birthday),y>=1990&&y<2000&&time>=10))
5 =file(“Client4.csv”).import@tc()
6 =file(“City4.csv”).import@tc()
7 =A5.join(city,A6:cityid,state)
8 =file(“Product4.csv”).import@tc()
9 =A8.join(city,A6:cityid,state)
10 =file(“Detail4.csv”).import@tc()
11 =file(“Order4.csv”).import@tc()
12 =A10.join(productid, A9:productid,state:product_state)
13 =A12.group(orderid)
14 =A11.switch(orderid, A13:orderid;saleid, A4:empid;clientid, A7:clientid)
15 =A14.select(saleid).new(saleid.empid:empid,saleid.name:sale_name,clientid.state:sale_location,orderid.select(product_state==“California”).sum(price):price)
16 =A15.groups(empid,sale_name,sale_location;sum(price):price).select(price)

When facing such complex association relations, SPL can handle easily as its homo-dimension, primary-sub and foreign key associations are all very clear. SPL can also associate two associations simultaneously.

SQL

SELECT e.empid,e.name,c2.state,sum(d.price) AS total_amount
FROM Order4 o
JOIN Employee4 e ON o.saleid=e.empid
JOIN Travel4 t ON e.empid = t.empid
JOIN Detail4 d ON o.orderid = d.orderid
JOIN Product4 p ON d.productid = p.productid
JOIN City4 c ON p.city = c.cityid
JOIN Client4 cl ON o.clientid = cl.clientid
JOIN City4 c2 ON cl.city = c2.cityid
WHERE e.birthday >= TO_DATE('1990-01-01', 'YYYY-MM-DD')
AND e.birthday < TO_DATE('2000-01-01', 'YYYY-MM-DD')
AND t.time >= 10 
AND c.state = 'California'
GROUP BY e.empid,e.name,c2.state;

Python

emp4 = pd.read_csv("../Employee4.csv")
trv4 = pd.read_csv("../Travel4.csv")
emp_inf = pd.merge(emp4,trv4,on=["empid","name"])
years = pd.to_datetime(emp_inf.birthday).dt.year
emp_inf_c = emp_inf[(years>=1990) & (years<2000)&(emp_inf.time>=10)]
clt4 = pd.read_csv("../Client4.csv")
city4 = pd.read_csv("../City4.csv")
sale_location = pd.merge(clt4,city4,left_on='city',right_on='cityid')
pdt4 = pd.read_csv("../Product4.csv")
pdt_location = pd.merge(pdt4,city4,left_on='city',right_on='cityid')
detail4 = pd.read_csv("../Detail4.csv")
order4 = pd.read_csv("../Order4.csv")
detail_pdt = pd.merge(detail4,pdt_location,on='productid',how="left")
order_sale_location = pd.merge(order4,sale_location,on='clientid',how="left")
order_sale_location_emp = pd.merge(order_sale_location,emp_inf_c,
    left_on='saleid',right_on='empid',how="left",suffixes=('_c', '_e'))
order_inf = order_sale_location_emp[order_sale_location_emp.empid.notnull()]
order_detail = pd.merge(order_inf,detail_pdt,on='orderid',how="left",suffixes=('_s', '_p'))
order_detail_Hljp = order_detail[order_detail.state_p=="California"]
res = order_detail_Hljp.groupby(['empid','name_e','state_s'],as_index=False).price.sum()

When the association relations are too complex, it is best for Python to handle one association at a time to avoid error.


What to use for data analysis programming, SQL, Python, or esProc SPL?
Example codes for comparing SPL, SQL, and Python