From SQL to SPL: Calculate the number of intersections between adjacent subsets after grouping
A certain database table records the execution status of a project, with multiple people participating in the project every day, and one person can participate in multiple tasks of the project in one day.
EMP_ID |
EMP_DATE |
EMP_TASK |
A1 |
04-01-2024 |
345 |
A2 |
04-01-2024 |
546 |
A3 |
04-01-2024 |
232 |
A4 |
04-01-2024 |
8000 |
A5 |
04-01-2024 |
2344 |
A1 |
04-02-2024 |
456 |
A2 |
04-02-2024 |
9280 |
A3 |
04-02-2024 |
324 |
A2 |
04-02-2024 |
754 |
A8 |
04-02-2024 |
75 |
A2 |
04-03-2024 |
400 |
A3 |
04-03-2024 |
234 |
A3 |
04-04-2024 |
100 |
Now we need to calculate: how many people have also participated in the project the previous day for every day. That is, after grouping by date, calculate the number of intersections between daily and previous day's personnel. The first day is special, assuming that everyone participated in the project the previous day.
EMP_DATE |
COUNT |
04-01-2024 |
5 |
04-02-2024 |
3 |
04-03-2024 |
2 |
04-04-2024 |
1 |
We can first group by date, and then perform intersection operations on the grouped subsets, which requires retaining the subsets after grouping. But after SQL grouping, it must aggregate immediately, and subsets cannot be retained, and calculating intersection is also impossible. It needs to do it another way, to group by person first and determine whether each person appeared on a certain date and the previous day, and then group and aggregate these dates, it involves multiple layers of nesting and window functions, which is very troublesome.
After grouping in SPL, subsets can be retained and adjacent subsets can be referenced. Code can be written directly according to the idea.
A |
|
1 |
=dbConn.query("select distinct EMP_ID,EMP_DATE from tb") |
2 |
=A1.group(EMP_DATE) |
3 |
=A2.new(EMP_DATE,if(#==1,~.len(),(~.(EMP_ID)^~[-1].(EMP_ID)).len()):COUNT) |
A1: Load data from the database and deduplicate EMP-ID.
A2: Group by date, but do not aggregate.
A3: Create a new two-dimensional table based on the grouping results. If the current group is Group 1, directly return the number of members in the group; If it is not the first group, then calculate the intersection of the EMP-IDs of the current group and the previous group, and then calculate the number of members. ^ is used for calculating intersection, [-1] represents the previous group.
Question source:https://stackoverflow.com/questions/78324132/how-do-i-perform-recursive-search-in-oracle
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/2bkGwqTj
Youtube 👉 https://www.youtube.com/@esProc_SPL