Compare two csv files

Here are two large, same-structure csv files A and B. Their primary keys are Name & Dept fields. The two have some different records.

A.csv

B.csv

Name,Dept,Salary

Jonathan,Administration,7

Alexis,Administration,16000

Timothy,Administration,0

Michael,Administration,0

Alexis_,Administration,0

Ashley,Finance,11000

Name,Dept,Salary

Jonathan,Administration,7

Alexis,Administration,16

Timothy,Finance,5000

Ashley,Finance,11000

Daniel,HR,1600

Joseph_,Finance,1600

Use Java to compare primary keys of the two files to find records that exist in A but that does not exist in B according to the key values.

Name

Dept

Salary

Alexis_

Administration

0

Michael

Administration

0

Timothy

Administration

0

Write the following SPL code:



A

1

=T@c("A.csv")

=T@c("B.csv")

2

=A1.sortx(Name,Dept)

=B1.sortx(Name,Dept)

3

=[A2,B2].merge@d(Name,Dept).fetch()


T()function parses a csv file; @c option enables retrieving data from a file that does not fit into the memory. sortx() function sorts data in a cursor. merge() function merges two cursors; @d enables calculating the difference.

The logic of the above code can be also expressed in a single SPL statement:

=[T@c(""A.csv"").sortx(Name,Dept),T@c(""B.csv"").sortx(Name,Dept)].merge@d(Name,Dept).fetch()

Read How to Call a SPL Script in Java to find how to integrate SPL into a Java application.
Source:
https://stackoverflow.com/questions/75987204/efficiently-comparing-two-large-java-lists-to-find-unique-items