3.3 Get union of two sets

 

The union of set A and set B is made up of members that are in set A or in set B. For instance, the union of [1,2] and [1,3] is [1,2,3]. Note that their common member 1 only appears once in the union.

Here is an example. A business has two sales channels – online and offline stores. Their sales data is stored in SALES_ONLINE table and SALES_STORE table respectively. The task is to get products whose online cumulative sales amount is above 10000 or that are purchased more than 5 times offline in the year 2014. Below is part of data in the two tables:

SALES_ONLINE:

ID CUSTOMERID ORDERDATE PRODUCTID AMOUNT
1 HANAR 2014/01/01 17 3868.6
2 ERNSH 2014/01/03 42 1005.9
3 LINOD 2014/01/06 64 400.0
4 OTTIK 2014/01/07 10 1194.0
5 OCEAN 2014/01/09 41 319.2

SALES_STORE:

ID CUSTOMERID ORDERDATE SELLERID PRODUCTID AMOUNT
1 EASTC 2014/01/01 1 27 3063.0
2 ERNSH 2014/01/02 8 70 2713.5
3 MAGAA 2014/01/03 2 74 1675.0
4 SAVEA 2014/01/07 7 45 2018.2
5 FOLIG 2014/01/08 8 30 1622.4

SPL can use character “&” to calculate union of sets. A&B, for instance, calculates union of set A and set B.

SPL script:

A
1 =T(“SalesOnline.xlsx”).select(year(ORDERDATE)==2014)
2 =T(“SalesStore.xlsx”).select(year(ORDERDATE)==2014)
3 =A1.groups(PRODUCTID; sum(AMOUNT):AMOUNT).select(AMOUNT>10000)
4 =A2.groups(PRODUCTID; count(~):COUNT).select(COUNT>5)
5 =A3.(PRODUCTID)&A4.(PRODUCTID)

A1: Import SALES_ONLINE table from the original file and select sales records of the year 2014.
A2: Import SALES_STORE table from the original file and select sales records of the year 2014.
A3: Group A1 by PRORDUCTID and calculate the total sales amount for each product in the online store, and select records where the total amount exceeds 10000.
A4: Group A2 by PRORDUCTID and calculate the total sales amount for each product in the offline store, and select records where the product is purchased more than 5 times.
A5: Use operator “&” to calculate the union of A3 and A4.