根据另一个文件中的映射合并文件

时间:2017-05-06 14:14:11

标签: perl parsing join

我在Perl中编写了一个脚本,它根据第三个文件中的映射合并文件;我没有使用Use of uninitialized value in join or string at join.pl line 43, <$fh> line 21.的原因是因为行不会总是匹配。代码有效,但是给出了一个似乎不会影响输出的错误:#!/usr/bin/perl use strict; use warnings; use diagnostics; use Tie::File; use Scalar::Util qw(looks_like_number); chomp( my $infile = $ARGV[0] ); chomp( my $infile1 = $ARGV[1] ); chomp( my $infile2 = $ARGV[2] ); chomp( my $outfile = $ARGV[3] ); open my $mapfile, '<', $infile or die "Could not open $infile: $!"; open my $file1, '<', $infile1 or die "Could not open $infile1: $!"; open my $file2, '<', $infile2 or die "Could not open $infile2: $!"; tie my @tieFile1, 'Tie::File', $infile1 or die "Could not open $infile1: $!"; tie my @tieFile2, 'Tie::File', $infile2 or die "Could not open $infile2: $!"; open my $output, '>', $outfile or die "Could not open $outfile: $!"; my %map1; my %map2; # This loop will read two input files and populate two hashes # using the coordinates (field 2) and the current line number while ( my $line1 = <$file1>, my $line2 = <$file2> ) { my @row1 = split( "\t", $line1 ); my @row2 = split( "\t", $line2 ); # $. holds the line number $map1{$row1[1]} = $.; $map2{$row2[1]} = $.; } close($file1); close($file2); while ( my $line = <$mapfile> ) { chomp $line; my @row = split( "\t", $line ); my $species1 = $row[1]; my $reference1 = $map1{$species1}; my $species2 = $row[3]; my $reference2 = $map2{$species2}; my @nomatch = ("NA", "", "NA", "", "", "", "", "NA", "NA"); # test numeric if ( looks_like_number($reference1) && looks_like_number($reference2) ) { # do the do using the maps print $output join("\t", $tieFile1[$reference1], $tieFile2[$reference2]), "\n"; } elsif ( looks_like_number($reference1) ) { print $output join("\t", $tieFile1[$reference1], @nomatch), "\n"; } elsif ( looks_like_number($reference2) ) { print $output join("\t", @nomatch, $tieFile2[$reference2]), "\n"; } } close($output); untie @tieFile1; untie @tieFile2; 因为我对Perl相对较新,所以我无法理解导致此错误的原因。任何帮助解决此错误或有关我的代码的建议将不胜感激。我在下面提供了示例输入和输出。

join.pl

Scf_3L  12798910    T   0   41  0   0   NA  NA
Scf_3L  12798911    C   0   0   43  0   NA  NA
Scf_3L  12798912    A   42  0   0   0   NA  NA
Scf_3L  12798913    G   0   0   0   44  NA  NA
Scf_3L  12798914    T   0   42  0   0   NA  NA
Scf_3L  12798915    G   0   0   0   44  NA  NA
Scf_3L  12798916    T   0   42  0   0   NA  NA
Scf_3L  12798917    A   41  0   0   0   NA  NA
Scf_3L  12798918    G   0   0   0   43  NA  NA
Scf_3L  12798919    T   0   43  0   0   NA  NA
Scf_3L  12798920    T   0   41  0   0   NA  NA

INPUT_1:

3L  12559896    T   0   31  0   0   NA  NA
3L  12559897    C   0   0   33  0   NA  NA
3L  12559898    A   34  0   0   0   NA  NA
3L  12559899    G   0   0   0   33  NA  NA
3L  12559900    T   0   34  0   0   NA  NA
3L  12559901    G   0   0   0   33  NA  NA
3L  12559902    T   0   33  0   0   NA  NA
3L  12559903    A   33  0   0   0   NA  NA
3L  12559904    G   0   0   0   33  NA  NA
3L  12559905    T   0   34  0   0   NA  NA
3L  12559906    T   0   33  0   0   NA  NA

INPUT_2:

3L  12798910    T   12559896    T
3L  12798911    C   12559897    C
3L  12798912    A   12559898    A
3L  12798913    G   12559899    G
3L  12798914    T   12559900    T
3L  12798915    G   12559901    G
3L  12798916    T   12559902    T
3L  12798917    A   12559903    A
3L  12798918    G   12559904    G
3L  12798919    T   12559905    T
3L  12798920    T   12559906    T

图:

Scf_3L  12798910    T   0   41  0   0   NA  NA    3L    12559896    T   0   31  0   0   NA  NA
Scf_3L  12798911    C   0   0   43  0   NA  NA    3L    12559897    C   0   0   33  0   NA  NA
Scf_3L  12798912    A   42  0   0   0   NA  NA    3L    12559898    A   34  0   0   0   NA  NA
Scf_3L  12798913    G   0   0   0   44  NA  NA    3L    12559899    G   0   0   0   33  NA  NA
Scf_3L  12798914    T   0   42  0   0   NA  NA    3L    12559900    T   0   34  0   0   NA  NA
Scf_3L  12798915    G   0   0   0   44  NA  NA    3L    12559901    G   0   0   0   33  NA  NA
Scf_3L  12798916    T   0   42  0   0   NA  NA    3L    12559902    T   0   33  0   0   NA  NA
Scf_3L  12798917    A   41  0   0   0   NA  NA    3L    12559903    A   33  0   0   0   NA  NA
Scf_3L  12798918    G   0   0   0   43  NA  NA    3L    12559904    G   0   0   0   33  NA  NA
Scf_3L  12798919    T   0   43  0   0   NA  NA    3L    12559905    T   0   34  0   0   NA  NA
Scf_3L  12798920    T   0   41  0   0   NA  NA    3L    12559906    T   0   33  0   0   NA  NA

输出:

{{1}}

1 个答案:

答案 0 :(得分:2)

直接问题是绑定数组的索引从零开始,而$.中的行号从1开始。这意味着你需要从$.或{{$reference中减去一个。 1}}变量在使用它们作为索引之前。因此,您的结果数据从一开始就不正确,如果没有警告,您可能会忽略它!

我解决了这个问题并且还整理了你的代码。我大多添加了use autodie,因此无需检查IO操作的状态(Tie::File除外),更改为列表分配,移动代码以将文件读入子例程,并添加了代码块,以便自动关闭词法文件句柄

我还使用绑定数组来构建%map哈希,而不是单独打开文件,这意味着它们的值已经从零开始,因为它们需要

哦,我删除了looks_like_number,因为$reference变量必须是数字或undef,因为我们将所有变量放入哈希值。检查值不是undef的正确方法是使用defined运算符

#!/usr/bin/perl

use strict;
use warnings 'all';
use autodie;

use Fcntl 'O_RDONLY';
use Tie::File;

my ( $mapfile, $infile1, $infile2, $outfile ) = @ARGV;

{
    tie my @file1, 'Tie::File' => $infile1, mode => O_RDONLY
        or die "Could not open $infile1: $!";

    tie my @file2, 'Tie::File' =>$infile2, mode => O_RDONLY
            or die "Could not open $infile2: $!";

    my %map1 = map { (split /\t/, $file1[$_], 3)[1] => $_ } 0 .. $#file1;
    my %map2 = map { (split /\t/, $file2[$_], 3)[1] => $_ } 0 .. $#file2;

    open my $map_fh, '<', $mapfile;

    open my $out_fh, '>', $outfile;

    while ( <$map_fh> ) {
        chomp;
        my @row = split /\t/;

        my ( $species1, $species2 ) = @row[1,3];
        my $reference1 = $map1{$species1};
        my $reference2 = $map2{$species2};

        my @nomatch    = ( "NA", "", "NA", "", "", "", "", "NA", "NA" );

        my @fields = (
            ( defined $reference1 ? $file1[$reference1] : @nomatch),
            ( defined $reference2 ? $file2[$reference2] : @nomatch),
        );

        print $out_fh join( "\t", @fields ), "\n";
    }
}

输出

Scf_3L  12798910    T   0   41  0   0   NA  NA  NA      NA                  NA  NA
Scf_3L  12798911    C   0   0   43  0   NA  NA  NA      NA                  NA  NA
Scf_3L  12798912    A   42  0   0   0   NA  NA  NA      NA                  NA  NA
Scf_3L  12798913    G   0   0   0   44  NA  NA  NA      NA                  NA  NA
Scf_3L  12798914    T   0   42  0   0   NA  NA  NA      NA                  NA  NA
Scf_3L  12798915    G   0   0   0   44  NA  NA  NA      NA                  NA  NA
Scf_3L  12798916    T   0   42  0   0   NA  NA  NA      NA                  NA  NA
Scf_3L  12798917    A   41  0   0   0   NA  NA  NA      NA                  NA  NA
Scf_3L  12798918    G   0   0   0   43  NA  NA  NA      NA                  NA  NA
Scf_3L  12798919    T   0   43  0   0   NA  NA  NA      NA                  NA  NA
Scf_3L  12798920    T   0   41  0   0   NA  NA  NA      NA                  NA  NA