我有一个multifasta文件,其中包含一百多个蛋白质序列。我正在尝试获取具有用户给定图案的序列。我已经试过了问题的答案 perl Script to search for a motif in a multifasta file and print the complete sequence along with the header line 它正在提供输出,但不正确
>KGHL009_Homo_sapiens
MFKSLIQFFKSKSNTSNIKKENAVQRQERQDIEGWITPYSGQELLNTELRQHHLGLLWQQVSMTREMFEH
LYQKPIERYAEMVQLLPASESHHHSHLGGMLDHGLEVISFAAKLRQNYVLPLNAAPEDQAKQKDAWTAAV
IYLALVHDIGKSIVDIEIQLQDGKRWLAWHGIPTLPYKFRYIKQRDYELHPVLGGFIANQLIAKETFDWL
ATYPEVFSALMYAMAGHYDKANVLAEIVQKADQNSVALALGGDITKLVQKPVISFAKQLI
>XIM5213_Mus_musculus
FKISSKGPGDGWLTEDGLWLMSKTTADQIRAYLMGQGISVPSDNRKLFDEMQAHRVIESTSEGNAIWYCQ
LSADAGWKPKDKFSLLRIKPEVIWDNIDDRPELFAGTICVVEKENEAEEKISNTVNEVQDTVPINKKENI
ELTSNLQEENTALQSLNPSQNPEVVVENCDNNSVDFLLNMFSDNNEQQVMNIPSADAEAGTTMILKSEPE
NLNTHIEVEANAIPKLPTNDDTHLKSEGQKFVDWLKD
以此类推
#!/usr/bin/perl -w
use strict;
use warnings;
print "Enter motif:";
$motif = <STDIN>;
my $seqfile = 'sequences.fasta';
my %seqs = %{ read_fasta_as_hash( 'sequences.fasta' ) };
open( my $motiffile, "+>", "motifseqs.fasta" ) or die $!;
foreach my $id ( keys %seqs ) {
if ( $seqs{$id} =~ /$motif/ ) {
print $motiffile $id, "\n";
print $motiffile $seqs{$id}, "\n";
}
}
sub read_fasta_as_hash {
my $fn = shift;
my $current_id = '';
my %seqs;
open FILE, "<$fn" or die $!;
while ( my $line = <FILE> ) {
chomp $line;
if ( $line =~ /^(>.*)$/ ) {
$current_id = $1;
}
elsif ( $line !~ /^\s*$/ ) { # skip blank lines
$seqs{$current_id} .= $line;
}
}
close FILE or die $!;
return \%seqs;
}
我不知道这是怎么回事。当我在终端上单独运行它时,它可以正常工作并给出带有输入主题的准确89个序列,但是在用另一个CGI脚本编写后,它给出了189个序列,其中大多数没有主题。