perl比较基于列和提取特定列的三个文件

时间:2016-02-27 16:13:52

标签: perl

文件1:

col1 col2 col3 col4 ..... col 15

文件2:

col1 col2 col3 col4 ..... col 15

文件3:

col1 col2 col3 col4 ..... col 15 每列文件都有数据。

我需要比较三个文件的前四列,并输出文件3中的公共文件以及文件1 col5。

输出:

文件3(col1 col2 col3 col4 ..... col 15)+文件1(col5)

我的代码:

    #!/usr/bin/perl -w
    use strict;
    use warnings;

    my $file1 = $ARGV[0];
    my $file2 = $ARGV[1];
    my $file3 = $ARGV[2];

    if($file1 eq "" || $file2 eq "" || $file3 eq "")
    {
             print "Incomplete parameters!\n";
             exit;
    }

    open(FILE1, $file1);
    open(FILE2, $file2);
    open(FILE3, $file3);
    open my $f, '>', "output.txt" or die "Cannot open output.txt: $!";

    my @arr1=<FILE1>;
    my @arr2=<FILE2>;
    my @arr3=<FILE3>;
    close FILE1;
    close FILE2;
    close FILE3;
    my %chash;
    for (@arr1)
    {
            chomp;
            my($col1,$col2,$col3,$col4,$col5,$rest)=split(/\t/);
            my $ckey="$col1$col2$col3$col4";
            $chash{$ckey}=1;
    }

    for (@arr2)
    {
            chomp;
            my($hit1,$hit2,$hit3,$hit4,$hit5,$rest)=split(/\t/);
            my $ckey="$hit1$hit2$hit3$hit4";
            $chash{$ckey}++;
    }
    for (@arr3)
    {
            chomp;
            my($val1,$val2,$val3,$val4,$rest)=split(/\t/);
            my $ckey="$val1$val2$val3$val4";
            $chash{$ckey}++;
            if($chash{$ckey} == 3)
            {
                    # this key has been seen in both previous files
                    print $f "$_\n";
            }
    }

此代码仅提供公共行。任何正文帮我提取文件1 col5和File 3公共线。

1 个答案:

答案 0 :(得分:0)

到达print语句时,$ col5值超出范围。因此,以相反的顺序处理文件,以便在使用print语句时$ col5在范围内。

for (@arr3)
{
    chomp;
    my($val1,$val2,$val3,$val4,$rest)=split(/\t/);
    my $ckey="$val1$val2$val3$val4";
    $chash{$ckey} =1;
}
for (@arr2)
{
    chomp;
    my($hit1,$hit2,$hit3,$hit4,$rest)=split(/\t/); # you don't need $hit5 here
    my $ckey="$hit1$hit2$hit3$hit4";
    $chash{$ckey}++;
}
for (@arr1)
{
    chomp;
    my($col1,$col2,$col3,$col4,$col5,$rest)=split(/\t/);
    my $ckey="$col1$col2$col3$col4";
    $chash{$ckey}++;

    if($chash{$ckey} == 3)
    {
        # this key has been seen in both previous files
        print $f "$_, $col5\n"; # $col5 is in scope
    }
}