From SQL to SPL: Search for the closest matching record within the group
The table mytable in the MS SQL database has one ConfirmationStarted and multiple Closed statuses for each ID.
CreatedAt |
ID |
NewStatus |
2022-05-25 23:17:44.000 |
147 |
Active |
2022-05-28 05:59:02.000 |
147 |
Closed |
2022-05-30 20:48:53.000 |
147 |
Active |
2022-06-18 05:59:01.000 |
147 |
Closed |
2022-06-21 20:09:48.000 |
147 |
Active |
2022-06-25 05:59:01.000 |
147 |
Closed |
2022-07-13 00:02:47.000 |
147 |
ConfirmationStarted |
2022-07-15 15:33:30.000 |
147 |
ConfirmationDone |
2022-08-25 05:59:01.000 |
147 |
Closed |
2023-03-08 13:34:57.000 |
1645 |
Draft |
2023-03-22 19:58:51.000 |
1645 |
Active |
2023-04-29 05:59:02.000 |
1645 |
Closed |
2023-05-08 14:50:29.000 |
1645 |
Awarded |
2023-05-08 14:53:34.000 |
1645 |
ConfirmationStarted |
2023-05-08 17:53:55.000 |
1645 |
ConfirmationDone |
Now we need to find the record closest to ConfirmationStarted among all the Closed records before ConfirmationStarted in each ID, and retrieve the ID and time fields of the record.
ID |
xdate |
147 |
2022-06-25 05:59:01.000 |
1645 |
2023-04-29 05:59:02.000 |
SQL solution:
WITH cte AS (
SELECT ID, CreatedAt, NewStatus,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY CreatedAt DESC) AS rn
FROM mytable
WHERE NewStatus = 'Closed'
AND CreatedAt < (
SELECT CreatedAt FROM mytable AS sub
WHERE sub.ID = mytable.ID AND sub.NewStatus = 'ConfirmationStarted'
)
)
SELECT ID, CreatedAt as xdate
FROM cte
WHERE rn = 1
ORDER BY ID;
With cte AS (
SELECT ID, CreatedAt, NewStatus,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY CreatedAt DESC) AS rn
FROM mytable
WHERE NewStatus = 'Closed'
AND CreatedAt < (
SELECT CreatedAt FROM mytable AS sub
WHERE sub.ID = mytable.ID AND sub.NewStatus = 'ConfirmationStarted'
)
)
SELECT ID, CreatedAt as xdate
FROM cte
WHERE rn = 1
ORDER BY ID;
SQL does not have natural sequence numbers, so it needs to generate sequence numbers using window functions first. After SQL grouping, it must aggregate immediately and records within the group cannot be filtered. It can only be solved in a roundabout way by filtering repeatedly using multi-level subqueries. The overall code is a bit cumbersome and difficult to understand.
SPL has natural sequence numbers and provides rich position related calculations. SPL grouping can retain subsets after grouping, making it easier to process data within the group.
A |
|
1 |
"select ID,CreatedAt,NewStatus from mytable order by CreatedAt”)=mssql.query( |
2 |
=A1.group(ID) |
3 |
=A2.(~.select@c(NewStatus!="ConfirmationStarted").select@z1(NewStatus=="Closed")) |
4 |
=A3.new(ID,CreatedAt:xdate) |
A1: Load data from the database and sort it by time.
A2: Group by ID, but do not aggregate.
A3: Filter each group of data, first find the records before ConfirmationStarted, and then filter out Closed from them, getting the last one. The select function is used for conditional filtering, which supports position related calculations during filtering, @c represents starting from the first record that makes the condition true and stopping when encountering a record that makes the condition false. @1 represents getting the first piece of the results, and @z represents filtering from back to front.
A2-A4 can be combined into one statement: =A1.group(ID;~.select@c(NewStatus!="ConfirmationStarted").select@z1(NewStatus=="Closed").CreatedAt:xdate)
Question source:https://stackoverflow.com/questions/78279330/how-do-i-find-the-date-before-a-value-changes-in-t-sql
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/2bkGwqTj
Youtube 👉 https://www.youtube.com/@esProc_SPL