Multi-file Join
【Question】
I have a big text file which has location data (represented numerically) and needs to be replaced with its corresponding location. These two files are separate text files.
Could you help with the Java utility on how to replace numeric values in one file using location values in the other.
Below is an example of the contents of file 1 and file 2. File 1 has numeric data and text. The numeric data in the first column needs to be replaced with the corresponding entry from file2. Thus file 1 needs to be looped for each entry in file 2.
Text File1:
19922973 @Uniquehope was good
Test File2:
19922973 Chicago, IL
Need to replace 19922973 with Chciago, IL. Please provide your inputs.
【Answer】
Your algorithm is like a SQL join. Since Java doesn’t offer functions for processing structured data, the code to achieve your requirement is really complicated, especially when files are too large to fit into memory. So here I use esProc SPL (Structured Process Language) to implement the file join:
A |
|
1 |
=file("D:\\file1.txt").import() |
2 |
=file("D:\\file2.txt").import() |
3 |
=join@1(A1:f1,_1; A2:f2,_1) |
4 |
=A3.new(f2._2,f1._2) |
5 |
=file("D:\\result.txt").export(A4) |
A1/A2: Import the two text files respectively.
A3: Join the two files, and in the meantime, change their names into f1 and f2 respectively. @1 option enables the desired left join by aligning A2 to A1.
A4: Get desired fields form A3’s result to form a new table sequence.
A5: Export A4’s result set to a target file.
You can refer to How to Call an SPL Script in Java to learn how to call an SPL script in a Java program.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL