Alignment Operations between Ordered Sets
【Abstract】
We can perform alignment operations on two ordered sets. Such as comparing the sizes of two sets, checking whether two sets are equal, etc. Read this article to find ways of handling these operations conveniently and efficiently and their esProc sample scripts.
The alignment operations over ordered sets include comparison operators (>, <, ==, etc.) and basic arithmetic operations (represented by +, -, *, /, %, \). The sets under discussion are ordered sets whose members are already sorted out. The alignment operations on two ordered sets compare their members or do arithmetic over members in corresponding positions in order.
1. Size comparison
We can use operators ">" and "<" to compare corresponding members of two sets from the first pair in order. To compare [1,3,1] and [1,2,2], for instance, we begin from the first pair of members, [1,1]. They are equal, and then go on to compare the second pair. Since 3>2, there is no need to compare other members and we get [1,3,1] > [1,2,2].
【Example 1】 Based on the Olympic Medal table, find the Games when China is before Russia in the table. Below is part of the table:
Game |
Nation |
Medal |
Game |
30 |
USA |
46,29,29 |
30 |
30 |
China |
38,27,23 |
30 |
30 |
UK |
29,17,19 |
30 |
30 |
Russia |
24,26,32 |
30 |
30 |
Korea |
13,8,7 |
30 |
… |
… |
… |
… |
According to the Olympic rule of ranking, nations are ranked in descending order by the number of gold medals they have won. If nations have same number of gold medals, they will be ranked by the number of silver medals in descending order; and then by the number of copper medals for nations having same total silver medals. Nations getting same total copper medals come equal.
【SPL script】
A |
B |
|
1 |
=file("Olympic.csv").import@cqt() |
/ Import the Olympic medal table for all games |
2 |
=A1.run(Medal=Medal.split@c()) |
/ Split each Medal value into a set of medal totals by comma |
3 |
=A2.group(Game) |
/ Group A2’s records by game |
4 |
=A3.select(~.select(Nation=="China").Medal>~.select(Nation=="Russia").Medal) |
/ Use ">" to compare China’s medal total set and Russia’s in the order of gold, silver and copper, and select the games when China goes ahead |
5 |
=A4.(Game) |
/ List the eligible games |
A5’s result:
Game |
23 |
25 |
28 |
29 |
30 |
2. Equality comparison
It’s common to compare two ordered sets to find if they are equal. Specific scenarios include the comparison of file content or table values, and others (compare [1,2,3] and [2,1,3] to see if they are equal, for instance). Generally two sets are regarded as unequal if any members in the corresponding positions are not consistent. But in certain occasions the order is unimportant and the greatest concern is to find if both sets have same members.
【Example 2】 The following file stores random samples. We want to find if two random samples get same IDs. Below is part of the source data:
ID |
Predicted_Y |
Original_Y |
10 |
0.012388464367608093 |
0.0 |
11 |
0.01519899123978988 |
0.0 |
13 |
0.0007920238885061248 |
0.0 |
19 |
0.0012656367468159102 |
0.0 |
21 |
0.009460545997473379 |
0.0 |
23 |
0.024176791871681664 |
0.0 |
… |
… |
… |
【SPL script】
A |
B |
|
1 |
=file("p_old.csv").import@ct() |
/ Import the first file exported |
2 |
=file("p_new.csv").import@ct() |
/ Import the second file exported |
3 |
=cmp(A1.(ID),A2.(ID)) |
/Use cmp() function to compare if the IDs in the two files are equal (the corresponding member values are equal and their orders are consistent) |
A3’s result:
Member |
0 |
The result is 0, which means the two files contain completely same IDs.
If the order of IDs are different, we can use eq() function to find whether the two sets have same members.
A |
B |
|
3 |
=A1.(ID).eq(A2.(ID)) |
/Use eq() function to compare if the two files contain same IDs whose order can be different |
3. Basic arithmetic operations on sets
The alignment operations include the basic arithmetic calculations (+,-,*,/,%,\). For instance, the set of sales amounts of store A in 3 days is [2,3,4] and that of sales amount of store B is [3,1,3], so the set of total sales amounts of the two stores is [5,4,7].
【Example 3】 Calculate the daily relative yield between SZSE 300 (399007) and SZSE Component Index (399001) during the period of December 24 -26, 2019. Below is part of the source data:
Date |
Code |
Name |
Open |
Close |
Amount |
2020/2/18 |
399001 |
Shenzhen |
11244.7651 |
11306.4863 |
3.19E+11 |
2020/2/17 |
399001 |
Shenzhen |
10974.9328 |
11241.4993 |
3.12E+11 |
2020/2/14 |
399001 |
Shenzhen |
10854.4551 |
10916.3117 |
2.77E+11 |
2020/2/13 |
399001 |
Shenzhen |
10936.5011 |
10864.3222 |
2.87E+11 |
2020/2/12 |
399001 |
Shenzhen |
10735.0475 |
10940.7952 |
2.66E+11 |
… |
… |
… |
… |
… |
… |
【SPL script】
A |
B |
|
1 |
=connect("db") |
/Connect to data sourece |
2 |
=["399007","399001"].(A1.query("select * from StockIndex where code=? and date between'2019-12-23'and'2019-12-26'",~)) |
/ Import data of SZSE 300 and SZSE Component Index from Dec. 23-26, 2019; the data of Dec. 23 is for calculating the growth rate |
3 |
=A2.(~.calc(to(2,4),Close/Close[-1])) |
/ Calculate the growth rate for each of the 3 days |
4 |
=A3(1)--A3(2) |
/ Use "--" to perform alignment subtraction between the two set to get the relative yields; the sign ?? (?∈{+,-,*,/,%,\}) is used to perform alignment operations between sets |
A4’s result:
Member |
0.0031349096521252617 |
0.0011897141619391371 |
-4.4910504685946595E-4 |
Find more examples in SPL CookBook.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/2bkGwqTj
Youtube 👉 https://www.youtube.com/@esProc_SPL
Chinese version