"【Question】 I want to match column 1 of file 1 with column 1 of file 2 and then column 2 of file .."

blackduckie RaqForum 28 No.
520 View • 5 Years ago

Key Associations between Multiple Files

text(125)

【Question】

I want to match column 1 of file 1 with column 1 of file 2 and then column 2 of file 1 with column 1 of file 3 and then print the matches. The columns in the files are separated by tabs. For example:

file 1:

fji01dde AIDJFMGKG

dlp02sle VMCFIJGM

cmr03lsp CKEIFJ

file 2:

fji01dde 25 30

dlp02sle 40 50

cmr03lsp 60 70

file 3:

AIDJFMGKG

CKEIFJ

Output:

fji01dde AIDJFMGKG 25 30

cmr03lsp CKEIFJ 60 70

I only want lines that are common in all three files.

The below code works well for the first two files, but I need to incorporate the third file. Any ideas?

#!/usr/bin/env perl

use strict;

my (%file1,%file2);

## Open the 1st file

open(A,"file1");

while(<A>){

chomp;

## Split the current line on tabs into the @F array.

my @F=split(/\t/);

push @{$file1{$F[0]}},@F[1..$#F];

}

## Open the 2nd file

open(B,"file2");

while(<B>){

chomp;

## Split the current line on tabs into the @F array.

my @F=split(/\t/);

if (defined($file1{$F[0]})) {

foreach my $col (@{$file1{$F[0]}}) {

print "$F[0]\t$col\t@F[1..$#F]\n";

}

What would the output be if file3 did not have CKEIFJ

It would be just fji01dde AIDJFMGKG 25 30

A solution:

use strict;

use warnings;

use Text::CSV_XS qw(csv);

my @csv_files = @ARGV;

# Parse all the CSV files into arrays of arrays.

my $data1 = csv(in => $csv_files[0], sep_char => "\t" );

# Parse the other CSV files into hashes of rows keyed on the columns we're going to search on.

my $data2 = csv(in => $csv_files[1],

sep_char => "\t",

headers => ["code", "num1", "num2"\],

key => "code"

);

my $data3 = csv(in => $csv_files[2],

sep_char => "\t",

headers => \["CODE"],

key => "CODE"

);

for my $row1 (@$data1) {

my $row2 = $data2->{$row1->[0]};

my $row3 = $data3->{$row1->[1]};

if($row2 && $row3) {

print join "\t", $row1->[0], $row1->[1], $row2->{num1}, $row2->{num2};

print "\n";

}

【Answer】
This is a typical structured computation. The relationship between file1’s column1 and file2’s column1is a left join; and that between file1’s column2 and file3’s column1 is also a left join. A language that supports structured algorithm can describe the relationship clearly and perform the joins easily. Yet Perl lacks related functions. It is able to achieve this but the code is hard to understand.

Here we handle your question with SPL (Structured Process Language). The code is simple and easy, as shown below:

	A
1	=file("D:/file1.txt").import()
2	=file("D:/file2.txt").import()
3	=file("D:/file3.txt").import()
4	=A1.switch(#1,A2:#1)
5	=A3.switch(#1,A1:#2)
6	=A5.new(#1.#1.#1, #1.#2, #1.#1.#2, #1.#1.#3)