Find Difference Between Two Text Files – Case 3
【Question】
I have two text files: file1.txt and file2.txt:
file1.txt
---------
Syed
Sheethal
Mirko
Rathod
.
.
.
file2.txt
---------
Syed
Vijay
Akash
.
.
.
.
Both files have millions of records. I need to do “file1.txt - file2.txt”.
Can anybody give me logically best approach?
Thanks & Regards,
Syed
【Answer】
Your question is to find difference, which is a set operation. Java lacks set operation class library and thus needs a lot of code to implement it. Try using esProc SPL (Structured Process Language) to handle the difference operation: Below is SPL script. It is simple.
A |
|
1 |
=file("e:\\f1.txt").cursor().sortx(_1) |
2 |
=file("e:\\f2.txt").cursor().sortx(_1) |
3 |
result [A1,A2].mergex@xd(1) |
A3: JAVA Find difference between A1 and A2 and return result to Java.
Here we assume that both files (billions of lines) are large and can’t be loaded into the memory at one time. So they need to be first sorted to make the operation faster. If the files ae relatively small, you don’t need to perform sort and can perform the operation using isect() function.
esProc offers a series of set operation functions to handle related computations. An SPL script can integrate with Java via esProc JDBC. See How to Call an SPL Script in Java to learn more.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL