Find Unique Columns from Text Files
【Question】
I have a large number of individual files that contain six columns each (number of rows can vary). As a simple example shows:
1 0 0 0 0 0
0 1 1 1 0 0
I am trying to identify how many unique columns I have (i.e. numbers and their order match). In this case it is 3. Is there a simple one-liner to do this? I know it is easy to compare one column with another column, but how to find identical columns?
【Answer】
Besides Awk, you can do this in SPL (Structured Process Language), which is better at handling complicated logic. To solve your problem (count unique columns in all files under /data directory), you can use the following one-liner:
A |
|
1 |
=directory@p("F:\\files\\data").new(~:file,(a=file(~).import(),a.fno().(a.field(~)).id().count()):count) |
A1: Count unique columns in each file in order and write the results to a two-dimensional table consisting of file field and count field.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL