esProc One-liners: Split a Field Value into N Records
Key words: field value splitting split one row into N rows
To split a field value into N strings by a separator and unfold the row into N rows in SQL is very inconvenient.
That’s because SQL doesn’t support the set objects. So it can’t expand one row into multiple rows according to a set of strings split from a value. The SQL way of doing this is like this. Get the maximum number of strings after value splitting (M), create a one-column, M-row temporary table (T2) with column named lv, in which values are 1,2,……,M, and perform a cross-join between T2 and the original table over the (T2.lv)th string after value splitting. This SQL query contains multiple complicated nested subqueries. Moreover, databases have their own functions for string splitting, so they write different SQL queries for doing this.
Here’s one example. Based on elective course table COURSES below, we need to find how many courses each student chooses.
COURSE |
STUDENTS |
Chinese |
Tom,Kate,John,Jimmy |
Russia |
Tom,Cart,Jimmy |
Spanish |
Kate,Joan,Cart |
Portuguese |
John,Tom |
History |
Tom,Cart,Kate |
Music |
Kate,Joan,Tom |
The expected result:
STUDENT |
NUM |
Tom |
5 |
Kate |
4 |
Cart |
3 |
Jimmy |
2 |
Joan |
2 |
John |
2 |
Oracle SQL does it in this way:
SELECT STUDENT, COUNT(*) NUM FROM
(SELECT T1.COURSE, REGEXP_SUBSTR(T1.STUDENTS, '[^,]+', 1, T2.LV ) STUDENT
FROM COURSES T1,
( SELECT LEVEL LV
FROM (SELECT MAX(REGEXP_COUNT(A.STUDENTS, '[^,]+', 1)) R_COUNT
FROM COURSES A
) B
CONNECT BY LEVEL <= B.R_COUNT) T2
WHERE REGEXP_SUBSTR(T1.STUDENTS, '[^,]+', 1, T2.LV) IS NOT NULL
) C
GROUP BY STUDENT
ORDER BY NUM DESC;
C is the temporary table. So the SQL query is multilevel and difficult to read.
The program would be concise and simple if we could use esProc SPL to handle it. It is a one-liner:
connect("mydb").query("SELECT * FROM COURSES").news(STUDENTS.split@c();~:STUDENT ).groups(STUDENT;count(1):NUM).sort(-NUM)
The Structured Process Language (SPL) supports set objects and provides the function for expanding a row to multiple rows according to a set. The code is brief, clear, simple and easy to understand. Most importantly, it enables universal syntax for producing consistent queries over various data sources.
The SPL sets function library includes intersection, difference, union & concatenation, aggregates and loops. Refer to SPL Set to learn more.
SPL is integration-friendly with a Java program. Read How to Call an SPL Script in Java to learn details.
About how to work with esProc, read Getting Started with esProc.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/2bkGwqTj
Youtube 👉 https://www.youtube.com/@esProc_SPL