Create a New column That Contains a List of Values from Another Column Subsequent Rows
Question
I have a table like below:
TMP,ID,VALUE
2021-12-03 04:03:45,ID1,O
2021-12-03 04:03:46,ID1,P
2021-12-03 04:03:47,ID1,Q
2021-12-03 04:03:48,ID1,R
2021-12-03 04:03:49,ID1,
2021-12-03 04:03:50,ID1,S
2021-12-03 04:03:51,ID1,T
2021-12-04 11:09:03,ID2,A
2021-12-04 11:09:04,ID2,B
2021-12-04 11:09:05,ID2,C
2021-12-04 11:09:06,ID2,D
And want to create a new column that contains a list of values from another column subsequent rows like below:
TMP,ID,VALUE,LIST
2021-12-03 04:03:45,ID1,O,P,Q,R
2021-12-03 04:03:46,ID1,P,Q,R
2021-12-03 04:03:47,ID1,Q,R
2021-12-03 04:03:48,ID1,R,
2021-12-03 04:03:49,ID1,,
2021-12-03 04:03:50,ID1,S,T
2021-12-03 04:03:51,ID1,T,
2021-12-04 11:09:03,ID2,A,B,C,D
2021-12-04 11:09:04,ID2,B,C,D
2021-12-04 11:09:05,ID2,C,D
2021-12-04 11:09:06,ID2,D,
for copy paste: timestamp ID Value
2021-12-03 04:03:45 ID1 O
2021-12-03 04:03:46 ID1 P
2021-12-03 04:03:47 ID1 Q
2021-12-03 04:03:48 ID1 R
2021-12-03 04:03:49 ID1 NULL
2021-12-03 04:03:50 ID1 S
2021-12-03 04:03:51 ID1 T
2021-12-04 11:09:03 ID2 A
2021-12-04 11:09:04 ID2 B
2021-12-04 11:09:05 ID2 C
Answer
Data that is ordered by TMP is also ordered by ID logically. So, you can group rows first by ID; in each group, create a new group when the previous VALUE is null; in each subgroup, use the comma to join up VALUEs from the second to the last non-null VALUE to form a sequence and make it the value of the new column LIST. A SQL set is unordered, which makes computing process very complicated. You need to first create a marker column using the window function, perform a self-join by the marker column, and group rows and join up VALUE values to get the desired result. A common alternative is to fetch the original data out of the database and process it in Python or SPL. SPL, the open-source Java package, is easy to be integrated into a Java program and generates much simpler code. It can get it done with only two lines of code:
A |
|
1 |
=ORACLE.query("SELECT * FROM TAB ORDER BY 1") |
2 |
=A1.group@o(#2).conj(~.group@i(#3[-1]==null).run(tmp=~.(#3).select(~),~=~.derive(tmp.m(#+1:).concat@c():LIST))).conj() |
View SPL source code.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL
Chinese version