3.1 Equivalence grouping

 

2.5 Loop to generate and execute


1. Grouping & single aggregation on single column - Calculate the number of employees in each department

2. Group & COUNT DISTINCT - Calculate how many states the employees in each department are from

3. Grouping & multiple aggregations on single column - Calculate the number of employees, the average salary and the maximum age of each department

4. Grouping & single aggregation on multiple columns - Calculate the average salary of employees in each department and each state

5. Grouping & multiple aggregations on multiple columns - Calculate the average salary, maximum salary and average age of employees in each department and each state

6. Grouping & no aggregation - Find employees with the same birthday (excluding the year)

7. Grouping & complex aggregation - Calculate the salary difference between the oldest and youngest employees in each department

SPL

A B
1 =file(“EMPLOYEE.csv”).import@tc()
2 =A1.groups(DEPT;count(1):emp_no) /Grouping & single aggregation on single column
3 =A1.groups(DEPT;icount(STATE)) /Group & COUNT DISTINCT
4 =A1.groups(DEPT;count(1):emp_no,avg(SALARY):avg_salary,max(age(BIRTHDAY)):max_age) /Grouping & multiple aggregations on single column
5 =A1.groups(DEPT,STATE;avg(SALARY):avg_salary) /Grouping & single aggregation on multiple columns
6 =A1.groups(DEPT,STATE;avg(SALARY):avg_salary,max(SALARY):max_salary,avg(age(BIRTHDAY)):avg_age) /Grouping & multiple aggregations on multiple columns
7 =A1.group(month(BIRTHDAY),day(BIRTHDAY)).select(~.len()>1).conj() /Grouping & no aggregation
8 =A1.group(DEPT;(ma=~.minp(BIRTHDAY),mi=~.maxp(BIRTHDAY), ma.SALARY-mi.SALARY):SALARY_DIF) /Grouping & complex aggregation

SQL

1. Grouping & single aggregation on single column

SELECT DEPT, COUNT(*) AS NUM_EMPLOYEES FROM EMPLOYEE GROUP BY DEPT;

2. Group & COUNT DISTINCT

SELECT DEPT, COUNT(DISTINCT STATE) AS NUM_STATES FROM EMPLOYEE GROUP BY DEPT;

3. Grouping & multiple aggregations on single column

SELECT DEPT, COUNT(*) AS NUM_EMPLOYEES, AVG(SALARY) AS AVG_SALARY, MAX(TRUNC(MONTHS_BETWEEN(SYSDATE, BIRTHDAY)/12)) AS MAX_AGE FROM EMPLOYEE GROUP BY DEPT;

4. Grouping & single aggregation on multiple columns

SELECT DEPT, STATE, AVG(SALARY) AS AVG_SALARY FROM EMPLOYEE GROUP BY DEPT, STATE;

5. Grouping & multiple aggregations on multiple columns

SELECT DEPT, STATE, AVG(SALARY) AS AVG_SALARY, MAX(SALARY) AS MAX_SALARY, AVG(TRUNC(MONTHS_BETWEEN(SYSDATE, BIRTHDAY)/12)) AS AVG_AGE FROM EMPLOYEE GROUP BY DEPT, STATE;

6. Grouping & no aggregation

SQL doesnt have ‘Grouping & no aggregation’, so it has to adopt an indirect method.

7. Grouping & complex aggregation

WITH MAX_AGE_EMPLOYEE AS ( SELECT DEPT, BIRTHDAY, SALARY FROM ( SELECT DEPT,BIRTHDAY,SALARY, ROW_NUMBER() OVER (PARTITION BY DEPT ORDER BY BIRTHDAY ASC) AS RN FROM EMPLOYEE) WHERE RN = 1), MIN_AGE_EMPLOYEE AS ( SELECT DEPT, BIRTHDAY, SALARY FROM ( SELECT DEPT,BIRTHDAY,SALARY, ROW_NUMBER() OVER (PARTITION BY DEPT ORDER BY BIRTHDAY DESC) AS RN FROM EMPLOYEE) WHERE RN = 1) SELECT MAX_AGE_EMPLOYEE.DEPT, MAX_AGE_EMPLOYEE. SALARY - MIN_AGE_EMPLOYEE.SALARY AS SALARY_DIFF FROM MAX_AGE_EMPLOYEE JOIN MIN_AGE_EMPLOYEE ON MAX_AGE_EMPLOYEE.DEPT = MIN_AGE_EMPLOYEE.DEPT;

Python

emp = pd.read_csv('../EMPLOYEE.csv') # Grouping & single aggregation on single column dept_counts = emp['DEPT'].value_counts() #Group & COUNT DISTINCT dept_state_counts = emp.groupby('DEPT')['STATE'].nunique() #Grouping & multiple aggregations on single column dept_stats = emp.groupby('DEPT').agg( count=('DEPT', 'count'), avg_salary=('SALARY', 'mean'), max_age=('BIRTHDAY', lambda x: (pd.Timestamp('now') - pd.to_datetime(x)).astype('<m8[Y]').max())) #Grouping & single aggregation on multiple columns dept_state_salary = emp.groupby(['DEPT', 'STATE'])['SALARY'].mean() #Grouping & multiple aggregations on multiple columns dept_state_stats = emp.groupby(['DEPT', 'STATE']).agg( avg_salary=('SALARY', 'mean'), max_salary=('SALARY', 'max'), avg_age=('BIRTHDAY', lambda x: (pd.Timestamp('now') - pd.to_datetime(x)).astype('<m8[Y]').mean())) birthday_md=pd.to_datetime(emp['BIRTHDAY']).dt.strftime('%m-%d').sort_values() #Grouping&noaggregation emp_same_birthday=emp.loc[birthday_md.index] [emp.groupby(birthday_md).BIRTHDAY.transform('count')>1] #Grouping&complexaggregation def salary_diff(g): max_age = g['BIRTHDAY'].idxmin() min_age = g['BIRTHDAY'].idxmax() diff = g.loc[max_age]['SALARY']-g.loc[min_age]['SALARY'] return diff emp['BIRTHDAY']=pd.to_datetime(emp['BIRTHDAY']) salary_diff=emp.groupby('DEPT').apply(salary_diff)

3.2 Non-equivalence grouping
Example codes for comparing SPL, SQL, and Python