Java Stream - Retrieving Repeated Records from CSV
Question
Source: https://stackoverflow.com/questions/68651921/java-stream-retrieving-repeated-records-from-csv
I searched the site and didn't find something similar. I'm newbie to using the Java stream, but I understand that it's a replacement for a loop command. However, I would like to know if there is a way to filter a CSV file using stream, as shown below, where only repeated records are included in the result and grouped by the Center field.
Initial CSV file
Id,Name,Mother,Birth,Center
1,A,A,2000-01-01,1
2,C,A,2000-01-02,1
3,P,M,2000-01-03,2
4,D,S,2000-01-04,3
5,R,H,2000-01-05,4
6,P,M,2000-01-03,2
7,A,A,2000-01-01,1
8,P,C,2000-01-08,2
9,R,I,2000-01-07,3
10,P,M,2000-01-03,2
Final result
Id,Name,Mother,Birth,Center
1,A,A,2000-01-01,1
7,A,A,2000-01-01,1
3,P,M,2000-01-03,2
6,P,M,2000-01-03,2
10,P,M,2000-01-03,2
In addition, the duplicate pair cannot appear in the final result inversely, as shown in the table below:
This shouldn't happen
Id,Name,Mother,Birth,Center
1,A,A,2000-01-01,1
7,A,A,2000-01-01,1
7,A,A,2000-01-01,1
1,A,A,2000-01-01,1
Is there a way to do it using stream and grouping at the same time, since theoretically, two loops would be needed to perform the task?
Thanks in advance.
Answer
The task requires to perform distinct on the CSV file by a non-id field and group the result set by Center field. The code will be very long if you try to do it in Java.
It is very simple to do it in SPL, the open-source Java package. You only need one line of code:
A |
|
1 |
=file("repeated.csv").import@ct().group(Name,Mother,Birth,Center).select(~.len()>1).conj() |
SPL offers JDBC driver to be invoked by Java. Just store the above SPL script as repeated.splx and invoke it in Java as you call a stored procedure:
…
Class.forName("com.esproc.jdbc.InternalDriver");
con= DriverManager.getConnection("jdbc:esproc:local://");
st=con.prepareCall("call repeated()");
st.execute();
…
Or execute the SPL string within a Java program using the way of executing a SQL statement:
…
st = con.prepareStatement("==file(\"repeated.csv\").import@ct().group(Name,Mother,Birth,Center).select(~.len()>1).conj()");
st.execute();
…
View SPL source code.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL
Chinese version