我正在尝试将一个文件(前三列)的3列与第二个文件的三列0,3,4匹配。我对下面的代码有疑问:请帮帮我。谢谢。
#!usr/bin/perl
use strict;
use warnings;
my $infile1 = $ARGV[0];
my $infile2 = $ARGV[1];
my $outfile = $ARGV[2];
open (INFILE1,"<", $infile1) || die "Cannot open $infile1:$!\n";
open (INFILE2, "<", $infile2) || die "Cannot open $infile2:$!\n";
open (OUTFILE, ">", $outfile) || die "Cannot open $outfile:$!\n";
my @array1;
my @array2;
my @array3;
my @array4;
my $_;
while (<INFILE1>) {
chomp;
@array1 = split (' ', $_);
push (@array2, "@array1\n");
#print "@array2\n";
}
while (<INFILE2>) {
chomp;
@array3 = split (' ', $_);
push (@array4, "@array3\n");
#print "@array4\n";
}
#print "@array2\n";
#print "@array4\n";
foreach my $array2(@array2) {
my @line = split(/\s+/,$array2);
my $chr1 = $line[0];
my $start1 = $line[1];
my $end1 = $line[2];
#print "$line[0]\n";
foreach my $array4(@array4) {
my @values = split(/\s+/, $array4);
my $chr2 = $values[0];
my $start2 = $values[3];
my $end2 = $values[4];
if (($chr1 eq $chr2) && ($start1 eq $start2) && ($end1 eq $end2)) {
#print "$start2\n";
print "$chr2\t$start2\t$end2\n";
}
}
}
file1.txt几行如下:
chr10 40095550 40096075
chr10 40102275 40102575
chr10 40139575 40140100
file2.txt几行如下:
chr1 mm10_knownGene exon 3205904 3207317 0.000000 - . gene_id "uc007aet.1"; transcript_id "uc007aet.1";
chr1 mm10_knownGene exon 3213439 3215632 0.000000 - . gene_id "uc007aet.1"; transcript_id "uc007aet.1";
chr1 mm10_knownGene stop_codon 3216022 3216024 0.000000 - . gene_id "uc007aeu.1"; transcript_id "uc007aeu.1";
答案 0 :(得分:1)
此处的问题解决方案为perldata
,请参阅hashes
部分。这些是键值对的关联数组。
它使绝大多数代码变得多余。
my %exists;
while ( <INFILE1> ) {
my ( $chr, $firstnum, $secondnum) = split;
$exists{$chr}{$firstnum}{$secondnum}++;
}
while ( <INFILE2> ) {
my ( $chr, $mm, $thing, $firstnum, $secondnum ) = split;
print if $exists{$chr}{$firstnum}{$secondnum};
}
我还建议您使用3个参数open而不是lexical文件句柄。
e.g。 :
open ( my $infile1_fh, "<", $infile1 ) or die $!;
然后
while ( <$infile1_fh> ) {
因为那时它们是局部作用域而不是全局作用。