Process a large csv file with parallel processing
A csv file stores a large amount orders data.
OrderID,Client,SellerID,Amount,OrderDate 1,SPLI,219,9173,01/17/2022 2,HU,110,6192,10/01/2020 3,SPL,173,5659,04/23/2020 4,OFS,7,3811,02/05/2023 5,ARO,146,3752,08/27/2021 6,SRR,449,10752,05/27/2022 7,SJCH,326,11719,01/18/2022 8,JDR,3,11828,12/09/2021 |
Use Java to process this file: Find orders whose amounts are between 3,000 and 5,000, group them by customers, and sum order amounts and count orders.
Client |
amt |
cnt |
ARO |
11948382 |
2972 |
BDR |
11720848 |
2933 |
BON |
11864952 |
2960 |
BSF |
11947734 |
2980 |
CHO |
11806401 |
2968 |
CHOP |
11511201 |
2877 |
D |
11491452 |
2876 |
DSG |
11672114 |
2910 |
DSGC |
11656479 |
2918 |
Write the following SPL statement:
=file("d:/OrdersBig.csv").cursor@mtc(;8).select(Amount>=3000 && Amount<5000).groups(Client;sum(Amount):amt,count(1):cnt)
cursor() function parses a large file that cannot fit into the memory; by default, it performs the serial computation. @m option enables multithreaded data retrieval; 8 is the number of parallel threads; @t option enables importing the first line as column titles; and @c option enables using comma as the separator.
Read How to Call a SPL Script in Java to find how to integrate SPL into a Java application.
Source:https://stackoverflow.com/questions/70586145/how-to-read-a-specific-column-of-a-row-from-a-csv-file-in-java
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL