Get duplicate records from a csv file and sort them
We have a large csv file (as shown below), where Center column is the category and the other columns contain detail data. There are duplicate detail data.
Id,Name,Mother,Birth,Center 1,Antonio Carlos da Silva,Ana da Silva,2008/03/31,1 2,Carlos Roberto de Souza,Amalia Maria de Souza,2004/12/10,1 3,Pedro de Albuquerque,Maria de Albuquerque,2006/04/03,2 4,Danilo da Silva Cardoso,SOnia de Paula Cardoso,2002/08/10,3 5,Ralfo dos Santos Filho,Helena dos Santos,2012/02/21,4 6,Pedro de Albuquerque,Maria de Albuquerque,2006/04/03,2 7,Antonio Carlos da Silva,Ana da Silva,2008/03/31,1 8,Paula Cristina de Abreu,Cristina Pereira de Abreu,2014/10/25,2 9,Rosana Pereira de Campos,Ivana Maria de Campos,2002/07/16,3 10,Pedro de Albuquerque,Maria de Albuquerque,2006/04/03,2 |
Use Java to perform stream-style retrieval and computation: Get all duplicate records from each group, sort them by Center column, and write them to a new file in order of Center, Id, Name, Mother, Birth (as shown below):
Center,Id,Name,Mother,Birth 1,1,Antonio Carlos da Silva,Ana da Silva,2008/03/31 1,7,Antonio Carlos da Silva,Ana da Silva,2008/03/31 2,3,Pedro de Albuquerque,Maria de Albuquerque,2006/04/03 2,6,Pedro de Albuquerque,Maria de Albuquerque,2006/04/03 2,10,Pedro de Albuquerque,Maria de Albuquerque,2006/04/03 |
Write the SPL code:
A |
|
1 |
=T@c("data.csv") |
2 |
=A1.sortx(Center,Name,Mother,Birth) |
3 |
=A2.group(Center,Name,Mother,Birth) |
4 |
=A3.select(~.len()>1).conj() |
5 |
=T("result.csv":A4,Center,Id,Name,Mother,Birth) |
A1: Retrieve the csv file that cannot fit into the memory in a stream style and return a cursor.
A2: Sort data by Center, Name, Mother and Birth.
A3: Group data without aggregation.
A4: Get groups having more than one member and concatenate members of all groups.
A5: Write data of the cursor to a new csv file while specifying the order of fields.
Read How to Call a SPL Script in Java to find how to integrate SPL into a Java application.
Source:https://stackoverflow.com/questions/68651921/java-stream-retrieving-repeated-records-from-csv
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL