From SQL to SPL: Calculate a pair of minimum values that meet the criteria within the group
A table stores events that occur for multiple accounts on multiple dates.
Row |
Account Number |
Date |
1 |
1001 |
2011-01-10 |
2 |
1001 |
2011-02-01 |
3 |
1001 |
2011-02-20 |
4 |
1001 |
2011-02-22 |
5 |
2001 |
2011-04-11 |
6 |
2001 |
2012-01-01 |
7 |
2001 |
2012-01-30 |
8 |
2001 |
2012-02-09 |
Now we need to find a pair of events that meet the criteria under each account, namely: event a with the earliest date, and event b with the earliest date among events that are more than 30 days away from event a.
Row |
Account Number |
Date |
1 |
1001 |
2011-01-10 |
3 |
1001 |
2011-02-20 |
5 |
2001 |
2011-04-11 |
6 |
2001 |
2012-01-01 |
We can sort by date and group by account, get the first record a from each group, then filter out all records that are more than 30 days away from event a, and also get the first record b. It is not too difficult for SQL to extract a using window functions, but it cannot keep the subset and continue filtering b. This requires a shift in thinking. After getting a, use the original table and a to join and calculate all events that are more than 30 days away from event a. Then use a similar method to find a to obtain b. Finally, union a and b and sort it to obtain the result. SQL involves multiple CTE clauses, which is cumbersome. There are two times to retrieve the first record within the group, and SQL does not have a natural sequence number, and it is also troublesome to create an additional sequence number before retrieving it.
After SPL grouping, the grouped subsets can be retained, so this task can be implemented using the previous approach. SPL naturally supports sequence numbers, which can easily select the first record, the first record within the group, and the first record of the filtered result.
A |
|
1 |
=dbConn.query("select Row, ’Account Number’, Date from tb order by Date") |
2 |
=A1.group(#2) |
3 |
=A2.conj(~1 | ~.select@1((#3 - A2.~1.#3)>30)) |
A1: Load data from the database and sort it by date.
A2: Group by account using the group function, with #2 representing the second field.
A3: Select the first record from each group, and then filter out records that are more than 30 days away from the first record, and also select the first record; Union 2 records; Finally, union the calculation results of each group. A2.~1 represents the first record within the group, and can be abbreviated as ~1 if there is no misunderstanding about its scope of action. The select function is used for filtering, where @1 represents getting the first filtering result.
Question source:https://stackoverflow.com/questions/78306966/select-rows-based-on-desired-date-difference-from-first-row-by-group-sql
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/2bkGwqTj
Youtube 👉 https://www.youtube.com/@esProc_SPL