Column Handling – Large CSV Files
【Question】
I've got a question regarding on how to process a delimited file with a large number of columns (>3000). I tried to extract fields with standard delimited file input component, but creating schema took hours and when I ran the job I got an error, because the toString() method exceeds the 65535-byte limit. At that point I could run the job but all the columns were messed up and I couldn’t really work with them anymore.
Is it possible to split that .csv file with Talend? Is there any other handling possible, maybe with some sort of Java code?
【Answer】
Here are your requirements: 1. The ability to process a large CSV file. 2. An optimized schema that accesses 3000 fields faster. 3. Java code. All of them can be achieved in SPL (Raqsoft’s Structured Process Language). The language supports accessing a large file with the cursor, provides a rich library of functions for structured computations, and is easy to integrate with a Java application. Below is the SPL script for retrieving certain columns from a large CSV file:
A |
|
1 |
=file("d:\\data.csv").cursor@tc(field,fieldYouNeed) |
The script is easily embedded into a Java application. See How to Call an SPL Script in Java to learn more.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL