perl匹配从一个精细到另一个文件的行,然后将当前行和下一行输出到新文件

时间:2013-08-12 16:52:12

标签: perl

如果您有任何人可以修改代码,以便在文件2中搜索文件1中的序列名称,并且如果匹配,则将文件1及其下一行中的行复制到outfile。现在代码只复制匹配的标题,但不复制其下一行,即outfile的序列。谢谢

例如:

  

文件1:

     

SEQUENCE 1 NAME

     

SEQUENCE 2 NAME

     

SEQUENCE 3 NAME

文件2:

  

SEQUENCE 1 NAME

     

AGTCAGTCAGTCAGTCAGTC

     

SEQUENCE 2 NAME

     

AAGGGTTTTCCCCCCAAAAA

     

SEQUENCE 3 NAME

     

GGGGTTTTTTTTTTAAAAAC

     

SEQUENCE 4 NAME

     

AAGTCCCCCCCCCCAAGGTT

  

OUTFILE:

     

SEQUENCE 1 NAME

     

AGTCAGTCAGTCAGTCAGTC

     

SEQUENCE 2 NAME

     

AAGGGTTTTCCCCCCAAAAA

     

SEQUENCE 3 NAME

     

GGGGTTTTTTTTTTAAAAAC

code: 

use strict;
use warnings;

my $f1 = 'FILE1.fasta';
open FILE1, "$f1" or die "Could not open file \n";
my $f2= 'FILE2.fasta';
open FILE2, "$f2" or die "Could not open file \n";

my $outfile = $ARGV[1];

my @outlines;
my $n=0;
foreach (<FILE1>) {
    my $y = 0;
    my $outer_text = $_ ;


    seek(FILE2,0,0);
    foreach (<FILE2>) {
        my $inner_text = $_;

        if($outer_text eq $inner_text) {    

            print "$outer_text\n";
            push(@outlines, $outer_text);
            $n++;

        }
    }
}
open (OUTFILE, "sequences.fasta") or die "Cannot open $outfile \ +n";
print OUTFILE @outlines;
close OUTFILE;

2 个答案:

答案 0 :(得分:0)

对于非常大的FILE1%seen哈希可以与某些DBM存储相关联,

use strict;
use warnings;

my $f1 = 'FILE1.fasta';
open FILE1, "<", $f1 or die $!;
my $f2 = 'FILE2.fasta';
open FILE2, "<", $f2 or die $!;

# my $outfile = $ARGV[1];
open OUTFILE, ">", "sequences.fasta" or die $!;

my %seen;
while (<FILE1>) {
    $seen{$_} = 1;
}

while (<FILE2>) {
    my $next_line = <FILE2>;

    if ($seen{$_}) {    
        print OUTFILE $_, $next_line;
    }
}
close OUTFILE;

答案 1 :(得分:0)

我会将文件2的内容放入哈希值,然后检查文件1中的每条记录是否都在哈希值中:

#!perl
use strict;
use warnings;

my $f2= 'FILE2.fasta';
open FILE2, "$f2" or die "Could not open file \n";

my $k;
my $v;
my %hash;

while (defined($k = <FILE2>)) {
        chomp $k;
        $v = <FILE2>;
        $hash{$k} = $v;
}

my $f1 = 'FILE1.fasta';
open FILE1, "$f1" or die "Could not open file \n";

open (OUTFILE, ">sequences.fasta") or die "Cannot open seqeneces.fasta\n";
while (<FILE1>) {
        chomp;
        if (exists($hash{$_})) {
                print OUTFILE "$_\n";
                print OUTFILE "$hash{$_}\n";
       }
}

close OUTFILE;