Align Two CSV Files by a Specific Column to Compare and Output the Differences
Problem description & analysis
data1.csv AND data2.csv have same structure (same column names) and both use id column to uniquely identify records. The two tables have same id values but could have different values in the other columns.
data1.csv
id,v1,v2 1001,1001_v1,1001_v2 1002,1002_v1,1002_v2 1003,1003_v1,1003_v2 1004,1004_v1,1004_v2 1005,1005_v1,1005_v2 … |
data2.csv
id,v1,v2 1005,1005_v2,1005_v1 1001,1001_v1,1001_v2 1002,1002_v1c,1002_v2 1003,1003_v1c,1003_v2c 1004,1004_v1,1004_v2 … |
The computing task is to find differences of the two files based on the other columns and output the result to result.csv. Here is the specific rule: For records of same id values in the two files, if the one in data2.csv has any value in the other column is different from the counterpart in data1.csv, then output the record of data2.csv. Below is part of the result file:
result.csv
id,v1,v2 1002,1002_v1c,1002_v2 1003,1003_v1c,1003_v2c 1005,1005_v2,1005_v1 … |
The algorithm involves order-based merge and difference operation.
Solution
We write the following script in esProc:
A |
|
1 |
=file("041/data1.csv").import@ct() |
2 |
=file("041/data2.csv").import@ct() |
3 |
=[A2,A1].merge@od() |
4 |
=file("result.csv").export@ct(A3) |
Merge file1 and file2 using @o option, which merges unordered memory data, and @d option, which performs difference intersection. With no to-be-compared columns specified, the function compares the whole records.
The script can also be written in the command line:
d:\raqsoft64\esProc\bin> .\esprocx -r file(\"result.csv\").export@ct([file(\"data2.csv\").import@ct(),file(\"data1.csv\").import@ct()].merge@d()) |
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_Desktop/
SPL Learning Material 👉 https://c.scudata.com
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProcDesktop
Linkedin Group 👉 https://www.linkedin.com/groups/14419406/