我是Perl的新手。我试图从一个文件中提取fasta序列,该文件与另一个文件中的行匹配。这两个示例文件如下:
File1.fasta:
> gene_44 | 105_nt | + | 47540 | 47644 GTGCGCCGGCGCGTCGCGATCGCGAACCGGCCCGTGCGAATCCTGCCGCATGCGCGCCGCATCTCGCCACGCCGCGCATTTCATTTCGACATCCATAACGTCTGA
> gene_69 | 111_nt | + | 75846 | 75956 ATGCCGTTGCCGTCGCGCATCGCGGCGGCCGTGCGCGGCGCGCATGCATACGCCGGCACGGCCGATGCGCGCGCGACGCGCAAACTGCACGCGGCGCGGGATTTGTGTTGA
> gene_88 | 177_nt | - | 97993 | 98169
ATGCGCCAGCCGACGCACGCCCATTCCGGGCGAAACGTTCCCCTTATCCATTCGATCATCCGTGCCGCACTGCGCGAAGCGGCCACCGCCGACACGTACCAAACCGCGCTCGATGCGACCGGCGCGGCACTCGTCGCCATCGCGGCGCTCGTGCGCGCGGAGGTGCGGCATGGCTGA> gene_90 | 141_nt | - | 99016 | 99156
TTGGAAGGGCGCTTTCCGCGTGCGAGTCGTCTGACGCAGCGTTGCACGGTCTGGTCGAATCGCGAGCTTCATCGCTGGATGGCCGATCCGTTGAACTATCGCGCTGTCGACGCGGCGAACCAGACGACGGAGGGCGCGTAA
File2.list:
somewordsinfront,> gene_44 | somewordsattheback
blablabla,> gene_88 | blablablablabla
我期望的输出如下:
> gene_44 | 105_nt | + | 47540 | 47644 GTGCGCCGGCGCGTCGCGATCGCGAACCGGCCCGTGCGAATCCTGCCGCATGCGCGCCGCATCTCGCCACGCCGCGCATTTCATTTCGACATCCATAACGTCTGA
> gene_88 | 177_nt | - | 97993 | 98169
ATGCGCCAGCCGACGCACGCCCATTCCGGGCGAAACGTTCCCCTTATCCATTCGATCATCCGTGCCGCACTGCGCGAAGCGGCCACCGCCGACACGTACCAAACCGCGCTCGATGCGACCGGCGCGGCACTCGTCGCCATCGCGGCGCTCGTGCGCGCGGAGGTGCGGCATGGCTGA
我怎样才能实现这一目标?提前致谢! :)
答案 0 :(得分:0)
下次当您提问时,请显示您的代码,例如
use strict;
use warnings;
my @genes;
open my $list, '<file2.list';
while (my $line = <$list>) {
push (@genes, $1) if $line =~ /[^>]+>([^|]+)/;
}
my $input;
close $list;
{
local $/ = undef;
open my $fasta, '<file1.fasta';
$input = <$fasta>;
close $fasta;
}
my @lines = split(/>/,$input);
foreach my $l (@lines) {
foreach my $reg (@genes) {
print ">$l" if $l =~ /$reg/
}
}