A Join between a Text File & JSON Text
【Question】
I have a tab-delimited text file A (representing a BLAST output):
Name1BBBBBBBBBBBB 99.40 166 1 0 1 166 334 499 3e-82 302
Name2DDDDDDDDDDDD 98.80 167 2 0 1 167 346 512 4e-81 298
And a text file B (representing a phylogenetic dendrogram) looking like:
{
"member": {
"Cluster A": "BBBBBBBBBBBB This is Animal A",
},
"name": "Cluster A"
},
{
"member": {
"Cluster B": "DDDDDDDDDDDD This is Animal B"
},
"name": "cluster B"
}
I want to take the string found in the 2nd tab of text file A (DDDDDDDDDDDD for example) and look it up in text file B. The script should then add the info found in text file B into a new tab of text file A :
Name1BBBBBBBBBBBB 99.40 166 1 0 1 166 334 499 3e-82 302 Cluster A This is Animal A
Name2DDDDDDDDDDDD 98.80 167 2 0 1 167 346 512 4e-81 298 Cluster B This is A
【Answer】
If we can read in the two files as two tables, then a SQL join can handle your question. Unfortunately both Perl and Shell don’t offer functions for conversion of text files into tables. It’s rather complicated to hardcode the process. In this case I choose to solve your problem with esProc SPL (Structured Process Language):
A |
|
1 |
=json(file("json.txt").read()) |
2 |
=A1.new(#1.name:name,#1.(#1):cluster,(firstblank=pos(cluster," "),left(cluster,firstblank-1)):key,right(cluster,len(cluster)-firstblank):value) |
3 |
=file("file.txt").import() |
4 |
=join(A3,_2;A2,key).new(_1._1,_1._2,_1._3,_1._4,_1._5,_1._6,_1._7,_1._8,_2.name,_2.value) |
A1: Read in the JSON text as a table.
A2: Split each record in A1 according to the tab and generate a new table sequence:
A3: Read in file.txt as a table:
A4: Cross join between A2 and A3.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL