Batch Convert a Large JSON File Where Each Row Is a Record into a CSV File
Problem description & analysis
We have a text file json.txt. Below is part of the data:
{"name":"abc1","age":20}
{"name":"abc2","age":20}
{"name":"abc3","age":20...
{"name":"abc4","age":20}
{"name":"abc5","age":20}
…
Each row is a JSON string. Yet there is dirty data in the file, such as row 3. We are trying to remove the dirty data and export the rest of the data to a CSV file. Below is a part of the desired result:
name |
age |
abc1 |
20 |
abc2 |
20 |
abc4 |
20 |
abc5 |
20 |
… |
… |
Solution
Write the following script p1.dfx in esProc:
A |
|
1 |
=file("json.txt").cursor@si() |
2 |
=A1.(json(~)).select(ifr(~) && ~.fno()==2) |
3 |
=file("csv.csv").export@ct(A2) |
Explanation:
A1 Create cursor for the text file and return it. Close the cursor after all data is scanned.
A2 Loop through A1’s cursor to convert each JSON string into a record, during which null will be returned if the braces cannot be matched (which is the dirty data), and then select records having two fields.
A3 Export A2’s result to csv.csv.
Read How to Call an SPL Script in Java to learn about the method of integrating the SPL script into Java.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL