Merge CSV Files into One Document with No Repeats in Row in Java
Question
I have to merge two csv files. I have implemented some code and the files merge yet I am getting repeated rows from the second document written to new lines. Like This : D,M,20211217,test17045,ehdef,ase_17045_26332@ukuat.com,38008621179,2021092700210571,16880,17045,UID1704510000037,1704537,222,0,20000101,London,510000,,0 // First file D,M,20211217,2021092700210471,UID1704510000027,16880,17045 // Second File.
I am looking to merge the two rows together by UniqueID field. This is the CSV Parser: public class CsvParser {
// CODE
public static List<CsvVo> getRecodrsFromACsv(File file, List<String> keys) throws IOException {
BufferedReader br = new BufferedReader(new FileReader(file));
List<CsvVo> records = new ArrayList<>();
boolean isHeader = true;
String line = null;
while ((line = br.readLine()) != null) {
if (isHeader) {// first line is header
isHeader = false;
continue;
}
CsvVo record = new CsvVo(file.getName());
String[] lineSplit = line.split(",");
for (int i = 0; i < lineSplit.length; i++) {
record.put(keys.get(i), lineSplit[i]);
}
records.add(record);
}
br.close();
return records;
}
public static List<String> getHeadersFromACsv(File file) throws IOException {
// if (file.exists()) {
BufferedReader br = new BufferedReader(new FileReader(file));
List<String> headers = null;
String line = null;
while ((line = br.readLine()) != null) {
String[] lineSplit = line.split(",");
headers = new ArrayList<>(Arrays.asList(lineSplit));
log.info("HEADERS :" + headers);
break;
}
br.close();
return headers;
// }
// return null;
}
public static void writeToCsv(final File file, final Set<String> headers, final List<CsvVo> records)
throws IOException {
FileWriter csvWriter = new FileWriter(file);
// write headers
String sep = "";
String[] headersArr = headers.toArray(new String[headers.size()]);
for (String header : headersArr) {
csvWriter.append(sep);
csvWriter.append(header);
sep = "|";
}
csvWriter.append("\n");
// write records at each line
for (CsvVo record : records) {
sep = "";
for (String s : headersArr) {
csvWriter.append(sep);
csvWriter.append(record.get(s));
sep = "|";
}
csvWriter.append("\n");
}
csvWriter.flush();
csvWriter.close();
}
This is the Merge Model //CODE
public class CsvVo {
private Map<String, String> keyVal;
public CsvVo(String id) {
keyVal = new LinkedHashMap<>();// you may also use HashMap if you don't need to keep order
}
public Map<String, String> getKeyVal() {
return keyVal;
}
public void setKeyVal(Map<String, String> keyVal) {
this.keyVal = keyVal;
}
public void put(String key, String val) {
keyVal.put(key, val);
}
public String get(String key) {
return keyVal.get(key);
}
This is the implementation: //CODE
File aseFile = new File("merge/mergeFile.txt");
File newFile = new File("dpcFileReturn.txt");
log.info("File To Be Processed :" + newFile.getName());
List<String> csv1Headers = CsvParser.getHeadersFromACsv(aseFile);
csv1Headers.forEach(h -> System.out.print(h + " "));
// System.out.println();
List<String> csv2Headers = CsvParser.getHeadersFromACsv(newFile);
csv2Headers.forEach(h -> System.out.print(h + " "));
// System.out.println();
List<String> allCsvHeaders = new ArrayList<>();
allCsvHeaders.addAll(csv1Headers);
allCsvHeaders.addAll(csv2Headers);
allCsvHeaders.forEach(h -> System.out.print(h + " "));
// System.out.println();
Set<String> uniqueHeaders = new HashSet<>(allCsvHeaders);
uniqueHeaders.forEach(h -> System.out.print(h + " "));
// System.out.println();
List<CsvVo> csv1Records = CsvParser.getRecodrsFromACsv(aseFile, csv1Headers);
List<CsvVo> csv2Records = CsvParser.getRecodrsFromACsv(newFile, csv2Headers);
List<CsvVo> allCsvRecords = new ArrayList<>();
allCsvRecords.addAll(csv1Records);
allCsvRecords.addAll(csv2Records);
File mergedFile = new File("mergedFile.txt");
CsvParser.writeToCsv(new File("mergedFile.txt"), uniqueHeaders, allCsvRecords);
log.info("Merged File :" + mergedFile);
The first file
recordType,activityType,activityDate,foreName,surName,emailAddress,mobilePhone,dpid,clientID,programmeID,uniqueID,bankAccount,sortCode,isJointAccount,dateOfBirth,addressLine1,postCode,clientReference,suspension,
D|M|20211217|test17045|afdib|ase_17045_29894@ukuat.com|30934992219|2021092700210261|16880|17045|UID1704510000006|1704506|003|0|20000101|London|510000||0|
D|M|20211217|test17045|ibabi|ase_17045_42069@ukuat.com|07676909173|2021092700210271|16880|17045|UID1704510000007|1704507|278|0|20000101|London|510000||0|
secondFile
H,activityType,activityDate,dpid,uniqueID,clientID,programmeID, D,M,20211217,2021092700210261,UID1704510000006,16880,17045, D,M,20211217,2021092700210271,UID1704510000007,16880,17045,
I am looking for a merged file that takes the user name, surname, email and/or uniqueID and validates the data and overwrites the row with the new(Missing) data, if any.
I can find nothing and I've even tried to have a nested loop to check row substring of file 1 against file 2. Cannot seem to get it working though.
Any assistance would be greatly appreciated.
Answer
The merge is to replace data in the first CSV file with corresponding new data in the second file, if any, and save result as new file. Java does not have the special class libraries for performing set-oriented calculations, so the process will be really roundabout.
You can use SPL, Java’s open-source package, to do this. Only a few lines are enough:
A |
|
1 |
=file("mergeFile.txt").import@ct() |
2 |
=file("dpcFileReturn.txt").import@ct() |
3 |
>A1.run(~.modify@f(ifn(A2.select@1(uniqueID==A1.uniqueID && activityDate>A1.activityDate),~))) |
4 |
=file("mergedFile.txt").export@ct(A1) |
SPL offers JDBC driver to be invoked by a Java program. Just store the above SPL script as mergeFile.splx and invoke it in Java in the way of calling a stored procedure:
…
Class.forName("com.esproc.jdbc.InternalDriver");
con = DriverManager.getConnection("jdbc:esproc:local://");
st = con.prepareCall("call mergeFile ()");
st.execute();
…
SPL source code: https://github.com/SPLWare/esProc
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/2bkGwqTj
Youtube 👉 https://www.youtube.com/@esProc_SPL
Chinese version