2.15 Grouping: distinct count
Perform distinct count when summarizing data in a table sequence. Now we are trying to find the most suitable field for cooperating with Pclass to form the primary key.
PassengerId | Survived | Pclass | Name | Sex | Age |
---|---|---|---|---|---|
1 | 0 | 3 | “Braund, Mr. Owen Harris” | male | 22 |
2 | 1 | 1 | “Cumings, Mrs. John Bradley” | female | 38 |
3 | 1 | 3 | “Heikkinen, Miss. Laina” | female | 26 |
4 | 1 | 1 | “Futrelle, Mrs. Jacques Heath” | female | 35 |
5 | 0 | 3 | “Allen, Mr. William Henry” | male | 35 |
6 | 0 | 3 | “Moran, Mr. James” | male | |
7 | 0 | 1 | “McCarthy, Mr. Timothy J” | male | 54 |
… | … | … | … | … | … |
SPL script:
A | B | C | |
---|---|---|---|
1 | =T(“titanic_train.xlsx”) | ||
2 | =A1.group(Pclass) | ||
3 | =A1.fno() | >result=[] | |
4 | for A3 | =A2.new(~.field(A4).icount():Dcount,~.len():Dlen) | |
5 | =B4.select(DCount==Dlen) | ||
6 | if B5.len()==A2.len() | >result.insert(0,A1.fname(A4)) | |
7 | return result |
A1 Load data table.
A2 Group table by Pclass.
A3 Get the ordinal number of each field.
A4 Loop each column to
B4 Use icount() to calculate the non-duplicate values of the current field in each group;
B5 Get records where the number of distinct values of the current field is equivalent to the length of the current group;
B6 If the number of records in B5 is equivalent to the number of groups in A2, the current field does not have duplicate values in all groups;
C6 Add name of the current field to the result set.
A7 Return result set.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL