8.2 Primary key association

 

8.1 Foreign key association


There are seven tables: store information table, order table, order detail table, payment collection table, store online rating table, store offline rating table, and store assessment score table. Perform the following calculations:

Store information table

storeid state
s101 California
s102 New York
s103 Florida

Order table

storeid orderid client location
s101 o10001 c104231 New York
s101 o10002 c107032 New York
s101 o10003 c108875 New York

Order detail table

storeid orderid productid price
s101 o10001 p1078 1273
s101 o10002 p1091 131
s101 o10002 p1008 780

Payment collection table

storeid orderid term payment
s101 o10001 1 318.25
s101 o10001 2 318.25
s101 o10001 3 318.25

Store online rating table

storeid online_score
s101 4
s102 1
s103 5

Store offline rating table

storeid offline_score
s101 10
s102 2
s103 2

Store assessment score table

storeid test_score
s101 4
s102 10
s103 8

1. Homo-dimension association - calculate the score given by client for a store (online score*0.4 + offline score*0.6)

2. Multiple homo-dimension tables - calculate the total score for a store (store assessment score*0.6+online score*0.1+offline score*0.3)

3. Primary-sub tables - query the total consumption of clients of each state in the store s150

4. Multi-layer primary-sub tables – aggregate the total sales of stores in each state

5. One primary table and multiple sub tables - find the order whose payments are not fully collected

SPL

A B
1 =file(“online_score.csv”).import@tc()
2 =file(“offline_score.csv”).import@tc()
3 =join@f(A1:online,storeid;A2:offline,storeid) /Homo-dimension association
4 =A3.new(if(online,online.storeid,offline.storeid):storeid,online.online_score:online_score,offline.offline_score:offline_score,0.4*online_score+0.6*offline_score:total)
5 =file(“test_score.csv”).import@tc().keys(storeid)
6 =join@f(A1:online,storeid;A2:offline,storeid;A5:test) /Multiple homo-dimension tables
7 =A6.new([online.storeid,offline.storeid,test.storeid].max():storeid,online.online_score:online_score,offline.offline_score:offline_score,test.test_score:test_score,0.6*test_score+0.1*online_score+0.3*offline_score:total)
8 =file(“Detail.csv”).import@tc()
9 =file(“Orders.csv”).import@tc()
10 =A8.select(storeid==“s150”)
11 =A9.select(storeid==“s150”)
12 =A10.groups(orderid;sum(price):sprice)
13 =join(A11:order,orderid;A12:consume,orderid) /Primary-sub association
14 =A13.groups(order.location;sum(consume.sprice):consume)
15 =file(“Store.csv”).import@tc()
16 =A8.groups(storeid,orderid;sum(price):sprice)
17 =join(A9:order,[storeid,orderid];A16:detail_sum) /Multi-layer primary-sub tables
18 =A17.groups(order.storeid;sum(detail_sum.sprice):sales)
19 =join(A15:store,storeid;A18:store_amount)
20 =A19.groups(store.state;sum(store_amount.sales):sales)
21 =file(“Payment.csv”).import@tc()
22 =A21.groups(storeid,orderid;sum(payment):payment)
23 =join(A9:order,[storeid,orderid];A16:sprice;A22:payment) /One primary table & multiple sub tables
24 1e-6
25 =A23.select(sprice.sprice-payment.payment>A24).(order)

SQL

1. Homo-dimension association

SELECT COALESCE(o.storeid, f.storeid) AS storeid,
    (COALESCE(o.online_score, 0)*0.4 + COALESCE(f.offline_score, 0)*0.6) AS customer_score
FROM online_score o
FULL JOIN offline_score f ON o.storeid = f.storeid;

2. Multiple homo-dimension tables

SELECT COALESCE(t.storeid, o.storeid, f.storeid) AS storeid,
    (COALESCE(t.test_score, 0)*0.6 + COALESCE(o.online_score, 0)*0.1 + 
        COALESCE(f.offline_score, 0)*0.3) AS total_score
FROM test_score t
FULL JOIN online_score o ON t.storeid = o.storeid
FULL JOIN offline_score f ON t.storeid = f.storeid;

3. Primary-sub tables

SELECT o.location, SUM(d.price) AS total_consumption
FROM Orders o
JOIN Detail d ON o.orderid = d.orderid AND o.storeid = d.storeid
WHERE o.storeid = 's150'
GROUP BY o.location;

4. Multi-layer primary-sub tables

SELECT s.state, SUM(d.price) AS total_sales
FROM Store s
JOIN Orders o ON s.storeid = o.storeid
JOIN Detail d ON o.orderid = d.orderid AND o.storeid = d.storeid
GROUP BY s.state;

5. One primary table and multiple sub tables

SELECT o.storeid,o.orderid, p. total_payment,d. total_price
FROM Orders o
JOIN (
    SELECT p.storeid, p.orderid, SUM(p.payment) AS total_payment
    FROM Payment p
    GROUP BY p.storeid, p.orderid) p
ON o.storeid = p.storeid AND o.orderid = p.orderid
JOIN (
    SELECT d.storeid, d.orderid, SUM(d.price) AS total_price
    FROM Detail d
    GROUP BY d.storeid, d.orderid) d
ON o.storeid = d.storeid AND o.orderid = d.orderid
WHERE d.total_price-p.total_payment>0.000001
ORDER BY o.storeid,o.orderid;

Python

#Homo-dimensionassociation
onscore=pd.read_csv("../online_score.csv",index_col='storeid')
ofscore=pd.read_csv("../offline_score.csv",index_col='storeid')
scores=pd.merge(onscore,ofscore,on="storeid",how="outer").fillna(0)
scores['total']=0.6*scores['offline_score']+0.4*scores['online_score']
#Multiplehomo-dimensiontables
tscore=pd.read_csv('../test_score.csv',index_col='storeid')
store_score=pd.concat([onscore,ofscore,tscore],axis=1,join='outer',sort=True).fillna(0).reset_index()
store_score['total']=0.6*store_score.test_score+0.3*store_score.offline_score+0.1*store_score.online_score
#Primary-subtables
detail=pd.read_csv("../Detail.csv")
order=pd.read_csv("../Orders.csv")
detail150=detail.query("storeid=='s150'")
order150=order.query("storeid=='s150'")
order_detail=pd.merge(order150,detail150,on="orderid",how="inner")
loc_consume=order_detail.groupby("location").price.sum()
#Multi-layerprimary-subtables
store=pd.read_csv("../Store.csv")
detail_sum=detail.groupby(['storeid','orderid']).price.sum()
order_detai=pd.merge(order,detail_sum,on=['storeid','orderid'])
store_amount=order_detai.groupby('storeid',as_index=False).price.sum()
store_inf=pd.merge(store,store_amount,on=["storeid"])
state_sale=store_inf.groupby('state',as_index=False).price.sum()
#Oneprimarytableandmultiplesubtables
pay=pd.read_csv("../Payment.csv")
detail_order=detail.groupby(['storeid','orderid']).price.sum()
pay_order=pay.groupby(['storeid','orderid']).payment.sum()
order=order.set_index(['storeid','orderid'])
order_d_p=order.join([pay_order,detail_order])
nopay_order=order_d_p.query('price-payment>1e-6')

8.3 Mixed association
Example codes for comparing SPL, SQL, and Python