From SQL to SPL: Search for the closest matching record within the group

The table mytable in the MS SQL database has one ConfirmationStarted and multiple Closed statuses for each ID.

CreatedAt

ID

NewStatus

2022-05-25 23:17:44.000

147

Active

2022-05-28 05:59:02.000

147

Closed

2022-05-30 20:48:53.000

147

Active

2022-06-18 05:59:01.000

147

Closed

2022-06-21 20:09:48.000

147

Active

2022-06-25 05:59:01.000

147

Closed

2022-07-13 00:02:47.000

147

ConfirmationStarted

2022-07-15 15:33:30.000

147

ConfirmationDone

2022-08-25 05:59:01.000

147

Closed

2023-03-08 13:34:57.000

1645

Draft

2023-03-22 19:58:51.000

1645

Active

2023-04-29 05:59:02.000

1645

Closed

2023-05-08 14:50:29.000

1645

Awarded

2023-05-08 14:53:34.000

1645

ConfirmationStarted

2023-05-08 17:53:55.000

1645

ConfirmationDone

Now we need to find the record closest to ConfirmationStarted among all the Closed records before ConfirmationStarted in each ID, and retrieve the ID and time fields of the record.

ID

xdate

147

2022-06-25 05:59:01.000

1645

2023-04-29 05:59:02.000

SQL solution


WITH cte AS (
    SELECT ID, CreatedAt, NewStatus,
        ROW_NUMBER() OVER (PARTITION BY ID ORDER BY CreatedAt DESC) AS rn
    FROM mytable
    WHERE NewStatus = 'Closed'
    AND CreatedAt < (
        SELECT CreatedAt FROM mytable AS sub
        WHERE sub.ID = mytable.ID AND sub.NewStatus = 'ConfirmationStarted'
    )
)
SELECT ID, CreatedAt as xdate
FROM cte
WHERE rn = 1
ORDER BY ID;
With cte AS (
    SELECT ID, CreatedAt, NewStatus,
        ROW_NUMBER() OVER (PARTITION BY ID ORDER BY CreatedAt DESC) AS rn
    FROM mytable
    WHERE NewStatus = 'Closed'
    AND CreatedAt < (
        SELECT CreatedAt FROM mytable AS sub
        WHERE sub.ID = mytable.ID AND sub.NewStatus = 'ConfirmationStarted'
    )
)
SELECT ID, CreatedAt as xdate
FROM cte
WHERE rn = 1
ORDER BY ID;

SQL does not have natural sequence numbers, so it needs to generate sequence numbers using window functions first. After SQL grouping, it must aggregate immediately and records within the group cannot be filtered. It can only be solved in a roundabout way by filtering repeatedly using multi-level subqueries. The overall code is a bit cumbersome and difficult to understand.

SPL has natural sequence numbers and provides rich position related calculations. SPL grouping can retain subsets after grouping, making it easier to process data within the group.


 A

1

"select ID,CreatedAt,NewStatus from mytable order by CreatedAt”)=mssql.query(

2

=A1.group(ID)

3

=A2.(~.select@c(NewStatus!="ConfirmationStarted").select@z1(NewStatus=="Closed"))

4

=A3.new(ID,CreatedAt:xdate)

A1 Load data from the database and sort it by time.

A2 Group by ID, but do not aggregate.

A3 Filter each group of data, first find the records before ConfirmationStarted, and then filter out Closed from them, getting the last one. The select function is used for conditional filtering, which supports position related calculations during filtering, @c represents starting from the first record that makes the condition true and stopping when encountering a record that makes the condition false. @1 represents getting the first piece of the results, and @z represents filtering from back to front.

A2-A4 can be combined into one statement: =A1.group(ID;~.select@c(NewStatus!="ConfirmationStarted").select@z1(NewStatus=="Closed").CreatedAt:xdate)

Question sourcehttps://stackoverflow.com/questions/78279330/how-do-i-find-the-date-before-a-value-changes-in-t-sql