6.21 Grouped subsets: filter grouped subsets by aggregate value
Strictly speaking, grouping and aggregation are two independent actions though they always appear side by side. This gives an impression that the two must come together. Actually, the duo sets a limit for grouping operations because more than the several SQL aggregate functions can be performed after the grouping operation.
ID | NAME | BIRTHDAY | STATE | DEPT | SALARY |
---|---|---|---|---|---|
1 | Rebecca | 1974/11/20 | California | R&D | 7000 |
2 | Ashley | 1980/07/19 | New York | Finance | 11000 |
3 | Rachel | 1970/12/17 | New Mexico | Sales | 9000 |
4 | Emily | 1985/03/07 | Texas | HR | 7000 |
5 | Ashley | 1975/05/13 | Texas | R&D | 16000 |
… | … | … | … | … | … |
Find employees whose ages are below the average age of the department.
SPL allows defining an operation on each grouped subsets after the grouping operation in A.group() function. The operation can be not only the existing SQL SUM, COUNT, etc. but a more complex one.
SPL script:
A | |
---|---|
1 | =T(“Employee.csv”) |
2 | =A1.group(DEPT; (a=~.avg(age(BIRTHDAY)), ~.select(age(BIRTHDAY)<a)):YOUNG) |
3 | =A2.conj(YOUNG) |
A1 Import Employee table.
A2 Group A1’s table by department and select records where the employee age is below the average age of each group. In an aggregate operation defined in A.group() function, we can use a temporary variable to make the computation simpler and easier to understand.
A3 Concatenate the selected records.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL