Remove Duplicate Data from a CSV File
Problem description & analysis
Below is CSV file csv.csv:
a,123,value1,a@email.com
a,123,value1,a@email.com
a,123,value1,a@email.com
a,123,Value7,a@email.com
b,567,Value5,b@email.com
b,567,Value6,b@email.com
b,567,Value6,b@email.com
We are trying to delete the duplicate records from the CSV file. Below is the desired result:
a,123,Value7,a@email.com
a,123,value1,a@email.com
b,567,Value5,b@email.com
b,567,Value6,b@email.com
Solutions:
Method: Through table sequence
Write the following script p1.dfx in esProc:
A |
|
1 |
=file("csv.csv").import@c() |
2 |
=A1.group@1(#1,#2,#3,#4) |
3 |
=file("result.csv").export@c(A2) |
Explanation:
A1 Import the CSV file as a table sequence.
A2 Group records by all columns, get the first record of each group, and return all the eligible records as a record sequence. This is equivalent to a distinct operation by all columns.
A3 Export result to result.csv.
Method 2: Through strings
Write the following script p1.dfx in esProc:
A |
|
1 |
=file("csv.csv").read@n() |
2 |
=A1.id() |
3 |
=file("result.csv").export(A2) |
Explanation:
A1 Read each row of the CSV file as a string and return a sequence of strings.
A2 Perform distinct operation on the sequence of strings using id() function.
A3 Export result to result.csv.
Method 3: Through sequence of sequences
Write the following script p1.dfx in esProc:
A |
|
1 |
=file("csv.csv").import@w() |
2 |
=A1.id() |
3 |
=file("result.csv").export@c(A2) |
Explanation:
A1 Import the CSV file as a sequence of sequences.
A2 Perform distinct operation on the sequence of sequences using id() function.
A3 Export result to result.csv.
Read How to Call an SPL Script in Java to learn about the integration of an SPL script with a Java program.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL