Delete Duplicate Records
【Question】
I am looking for some help. I have an application at work that generates a csv with user information on it. I want to use Java and take the data, delete duplicate information, rearrange it, and create a spreadsheet, to make life easier. The csv is generated in the following format, but much larger:
21458952, a1234, Doe, John, technology, support staff, work phone, 555-555-5555
21458952, a1234, Doe, John, technology, support staff, work email, johndoe@whatever.net
21458952, a1234, Doe, John, technology, support staff, work pager, 555-555-5555
99946133, b9854, Paul, Jane, technology, administration, work phone, 444-444-4444
99946133, b9854, Paul, Jane, technology, administration, work email, janepaul@whatever.net
99946133, b9854, Paul, Jane, technology, administration, work pager, 444-444-4444
99946133, b9854, Paul, Jane, technology, administration, cell phone, 444-444-4444
I want to delete the duplicates and arrange the data in appropriate columns.
ID | PIN | Lname | Fname | Dept | team | work px | work email
I have been trying to build arrays with a BufferedReader to store the data, but I am running into difficulties dealing with duplicates and manipulating the data into a table.
This is the code I have so far:
public class Sort {
public static void main(String[] args) {
BufferedReader br = null;
try{
String line="";
String csvSplitBy=(",");
String outPut;
br = new BufferedReader(new FileReader("C:/Users/Jason/Desktop/test.txt")); //location where the file is retreived
while ((line = br.readLine()) !=null){ //checks to see if the data is there
String[] id = line.split(csvSplitBy);
outPut = id[0] + "," + id[1] + "," + id[2] + "," + id[3] + "," + id[4] + "," + id[5] + "," + id[6] + "," + id[7]
+ "," + id[8] + "," + id[9];//incomplete...using for test...
System.out.println(outPut); //displays the contents of the .txt file
} //ends while statement
} //ends try
catch (IOException e){
System.out.println ("File not found!");
} //ends catch
finally{
try{
if (br !=null)br.close();}
catch(IOException ex){
ex.printStackTrace();
} //ends try
} //ends finally
} //ends main method
} //ends class Sort
【Answer】
ava lacks the class library for grouping text data and getting the unique values. So the hardcoding is rather complicated. You can use SPL (Structured Process Language) to do it effortlessly:
A |
|
1 |
=file("D:\\dup.csv").import@c() |
2 |
=A1.group(_1,_2,_3,_4,_5,_6;~.select@1(_7=="work phone")._8,~.select@1(_7=="work email")._8) |
3 |
=file("D:\\result.csv").export@c(A2) |
A1: Read in content in dup.csv.
A2: Remove duplicates and get desired records.
A3: Output data in A2’s table to result.csv.
An SPL script can be embedded into a Java program for further computation. See How to Call an SPL Script in Java to learn details.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL