Split a CSV File into Separate Files by Specific Columns
Problem description & analysis
Below is the content of CSV file csv.csv:
X,Y,Z,Sc1,Sc2,Sc3,Sc4
1,0,0,5,7,9,10
0,1,1,6,8,4,0
0,0,1,3,3,8,2
We are trying to divide the file into four smaller files, in which X, Y, and Z fields are always retained. Below is the expected result:
File 1 | File 2 | File 3 | File 4
----------------------------------------------------------
[Sc1, X,Y,Z] | [Sc2, X,Y,Z] | [Sc3, X,Y,Z] | [Sc4, X,Y,Z]
[5, 1,0,0] | [7, 1,0,0] | [9, 1,0,0] | [10, 1,0,0]
[6, 0,1,1] | [8, 0,1,1] | [4, 0,1,1] | [0, 0,1,1]
[3, 0,0,1] | [3, 0,0,1] | [8, 0,0,1] | [2, 0,0,1]
Solution
We write the following script p1.dfx in esProc:
A |
|
1 |
=file("csv.csv").import@cw() |
2 |
=4.(file("File"/~/".csv")) |
3 |
=A2.run(~.write(A1.([~(3+A2.#),~(1),~(2),~(3)]).concat@nc())) |
Explanation:
A1 Import the CSV file as a sequence of sequences.
A2 The n files are File[n].csv, in which n is 4.
A3 Get desired columns from A1’s sequence of sequences in order, concatenate them into a string to write into the corresponding file.
See How to Call an SPL Script in Java to learn about the integration of an SPL script with a Java program.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL