"【Question】 I've got a question regarding on how to process a delimited file with a large number .."

blackduckie RaqForum 28 No.
455 View • 5 Years ago

Column Handling – Large CSV Files

text(125)

【Question】

I've got a question regarding on how to process a delimited file with a large number of columns (>3000). I tried to extract fields with standard delimited file input component, but creating schema took hours and when I ran the job I got an error, because the toString() method exceeds the 65535-byte limit. At that point I could run the job but all the columns were messed up and I couldn’t really work with them anymore.

Is it possible to split that .csv file with Talend? Is there any other handling possible, maybe with some sort of Java code?

【Answer】

Here are your requirements: 1. The ability to process a large CSV file. 2. An optimized schema that accesses 3000 fields faster. 3. Java code. All of them can be achieved in SPL (Raqsoft’s Structured Process Language). The language supports accessing a large file with the cursor, provides a rich library of functions for structured computations, and is easy to integrate with a Java application. Below is the SPL script for retrieving certain columns from a large CSV file:

	A
1	=file("d:\\data.csv").cursor@tc(field,fieldYouNeed)

The script is easily embedded into a Java application. See How to Call an SPL Script in Java to learn more.

SPL Official Website 👉 https://www.scudata.com

SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL

SPL Learning Material 👉 https://c.scudata.com

SPL Source Code and Package 👉 https://github.com/SPLWare/esProc

Discord 👉 https://discord.gg/cFTcUNs7

Youtube 👉 https://www.youtube.com/@esProc_SPL

text(125)

Application

blackduckie • 455 View • 5 Years ago