6.22 Grouped subsets: filter grouped subsets and group the selected subsets
Select states which more than 50 employees are based and calculate average salary of each department in these states.
ID | NAME | BIRTHDAY | STATE | DEPT | SALARY |
---|---|---|---|---|---|
1 | Rebecca | 1974/11/20 | California | R&D | 7000 |
2 | Ashley | 1980/07/19 | New York | Finance | 11000 |
3 | Rachel | 1970/12/17 | New Mexico | Sales | 9000 |
4 | Emily | 1985/03/07 | Texas | HR | 7000 |
5 | Ashley | 1975/05/13 | Texas | R&D | 16000 |
… | … | … | … | … | … |
A grouping operation itself is complex. Retaining grouped subsets and reusing them can increase computing efficiency. A.group() function retains grouped subsets after grouping so that we can further perform grouping & aggregation on the grouped subsets.
SPL script:
A | |
---|---|
1 | =T(“Employee.csv”) |
2 | =A1.group(STATE).select(~.len()>50) |
3 | =A2.conj(~.groups(DEPT; avg(SALARY):AVG_SALARY).derive(A2.~.STATE:STATE)) |
A1 Import Employee table.
A2 Group A1’s table by state and get the grouped subsets containing more than 50 employees.
A3 In each subset of state, group records by department, calculate average salary in each department and concatenate all result sets.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL