Split a Huge CSV File into Multiple Smaller CSV Files
Problem description & analysis
Below is CSV file sample.csv:
v2aowqhugt,q640lwdtat,8cqw2gtm0g,ybdncfeue8,3tzwyiouft,…
f0ewv2v00z,x2ck96ngmd,9htr2874n5,fx430s8wqy,tw40yn3t0j,…
p2h6fphwco,kldbn6rbzt,8okyllngxz,a8k9slqfms,bqz5fb7cm9,…
st63tcbfv8,2n862vqzww,2equ0ydeet,0x5tidunc6,npis28avpj,…
bn1u58s39a,mg7064jlrb,edyj3t4s95,zvuf9n29ai,1m0yn8uh0n,…
…
The file contains a huge volume of data that cannot be wholly loaded into the memory. 100000 rows at most can be loaded at a time into the available memory space. So we need to split the file into multiple smaller CSV files containing 100000 rows each, as shown below:
sample1.csv 100000 rows
sample2.csv 100000 rows
…
sample[n].csv less than or equal to 100000 rows
Solution
Write the script p1.dfx below in esProc:
A |
B |
|
1 |
=file("sample.csv").cursor() |
|
2 |
for A1,100000 |
=file("sample"/#A2/".csv").export(A2) |
Explanation
A1 Create a cursor for the original CSV file.
A2 Loop through A1’s cursor to read in 100000 rows at one time.
B2 Export A2’s rows to sample[n].csv. #A2 represents the loop number which starts from 1.
Read How to Call an SPL Script in Java to learn how to integrate the script code into a Java program.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL