Row-to-column Transposition of Non-header Text Files
【Question】
I have a tab delimited file that looks like this:
file.txt gives:
1 299
1 150
1 50
1 57
2 -45
2 62
3 515
3 215
3 -315
3 -35
3 3
3 6789
3 34
5 66
5 1334
5 123
I’d like to use Unix commands to get a tab-delimited file that based on values in column#1, each column of the output file will hold all relevant values of column#2 (I’m using separator “|” instead of tab only to illustrate my desired output file):
299 | -45| 515| 66
150| 62| 215| 1334
50 | | -315 |
57 | | -35 |
| | 3 |
The corresponding Headers (1,2,3,5; based on column#1 values) could be a nice addition to the code (as shown below), but the main request is to split the information of the first file into separated columns. Thanks!
1| 2| 3| 5
299 | -45| 515| 66
150| 62| 215| 1334
50 | | -315 |
57 | | -35 |
| | 3 |
Below is a solution:
awk '
BEGIN{max=0;}
{
d\[$1\]\[length(d\[$1\])+1\] = $2;
if(length(d\[$1\])>max)
max = length(d\[$1\]);
}
END{
PROCINFO\["sorted\_in"\] = "@ind\_num_asc";
line = "";
flag = 0;
for(j in d){
line = line (flag?"\\t|\\t":"") j;
flag = 1;
}
print line;
for(i=1; i<=max; ++i){
line = "";
flag = 0;
for(j in d){
line = line (flag?"\\t|\\t":"") d\[j\]\[i\];
flag = 1;
}
print line;
}
}' file.txt
you get
1 | 2 | 3 | 5
299 | -45 | 515 | 66
150 | 62 | 215 | 1334
50 | | -315 |
57 | | -35 |
| | 3 |
Or, you can use python …. for example, in split2Columns.py
import sys
records = \[line.split() for line in open(sys.argv\[1\])\]
import collections
records_dict = collections.defaultdict(list)
for key, val in records:
records_dict\[key\].append(val)
from itertools import izip_longest
print "\\t|\\t".join(records_dict.keys())
print "\\n".join(("\\t|\\t".join(map(str,l)) for l in izip\_longest(*records\_dict.values(), fillvalue="")))
【Answer】
Group the text file by column#1 to generate the expected result set, and then enter members of each group into the result set in vertical form.
Here post-grouping operations and row-to-column transposition are involved. Below is the SPL script, which is simple and easy to understand:
A |
|
1 |
=file("D:/file.txt").import() |
2 |
=A1.group(#1) |
3 |
=create(${A2.(#1).concat(",")}).paste(${A2.len().("A2("/~/").(#2)").concat(",")}) |
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/2bkGwqTj
Youtube 👉 https://www.youtube.com/@esProc_SPL