如果您有任何人可以修改代码,以便在文件2中搜索文件1中的序列名称,并且如果匹配,则将文件1及其下一行中的行复制到outfile。现在代码只复制匹配的标题,但不复制其下一行,即outfile的序列。谢谢
例如:
文件1:
SEQUENCE 1 NAME
SEQUENCE 2 NAME
SEQUENCE 3 NAME
文件2:
SEQUENCE 1 NAME
AGTCAGTCAGTCAGTCAGTC
SEQUENCE 2 NAME
AAGGGTTTTCCCCCCAAAAA
SEQUENCE 3 NAME
GGGGTTTTTTTTTTAAAAAC
SEQUENCE 4 NAME
AAGTCCCCCCCCCCAAGGTT
等
OUTFILE:
SEQUENCE 1 NAME
AGTCAGTCAGTCAGTCAGTC
SEQUENCE 2 NAME
AAGGGTTTTCCCCCCAAAAA
SEQUENCE 3 NAME
GGGGTTTTTTTTTTAAAAAC
code:
use strict;
use warnings;
my $f1 = 'FILE1.fasta';
open FILE1, "$f1" or die "Could not open file \n";
my $f2= 'FILE2.fasta';
open FILE2, "$f2" or die "Could not open file \n";
my $outfile = $ARGV[1];
my @outlines;
my $n=0;
foreach (<FILE1>) {
my $y = 0;
my $outer_text = $_ ;
seek(FILE2,0,0);
foreach (<FILE2>) {
my $inner_text = $_;
if($outer_text eq $inner_text) {
print "$outer_text\n";
push(@outlines, $outer_text);
$n++;
}
}
}
open (OUTFILE, "sequences.fasta") or die "Cannot open $outfile \ +n";
print OUTFILE @outlines;
close OUTFILE;
答案 0 :(得分:0)
对于非常大的FILE1
,%seen
哈希可以与某些DBM
存储相关联,
use strict;
use warnings;
my $f1 = 'FILE1.fasta';
open FILE1, "<", $f1 or die $!;
my $f2 = 'FILE2.fasta';
open FILE2, "<", $f2 or die $!;
# my $outfile = $ARGV[1];
open OUTFILE, ">", "sequences.fasta" or die $!;
my %seen;
while (<FILE1>) {
$seen{$_} = 1;
}
while (<FILE2>) {
my $next_line = <FILE2>;
if ($seen{$_}) {
print OUTFILE $_, $next_line;
}
}
close OUTFILE;
答案 1 :(得分:0)
我会将文件2的内容放入哈希值,然后检查文件1中的每条记录是否都在哈希值中:
#!perl
use strict;
use warnings;
my $f2= 'FILE2.fasta';
open FILE2, "$f2" or die "Could not open file \n";
my $k;
my $v;
my %hash;
while (defined($k = <FILE2>)) {
chomp $k;
$v = <FILE2>;
$hash{$k} = $v;
}
my $f1 = 'FILE1.fasta';
open FILE1, "$f1" or die "Could not open file \n";
open (OUTFILE, ">sequences.fasta") or die "Cannot open seqeneces.fasta\n";
while (<FILE1>) {
chomp;
if (exists($hash{$_})) {
print OUTFILE "$_\n";
print OUTFILE "$hash{$_}\n";
}
}
close OUTFILE;