从genbank格式中提取序列

时间:2013-06-26 14:04:50

标签: perl bioinformatics bioperl genbank

此代码工作正常,但现在抱怨,genbank结构是否已更改?

#!/usr/bin/perl -w                                                                                                                                                                 
#use strinct ;                                                                                                                                                                     

use Bio::SeqIO;
use Bio::Seq;
use Bio::DB::EUtilities;


    @refSeqIDs=qw(NC_000915.1 NC_017379.1 NC_017371.1 NC_017354.1);
    foreach my $refSeqIDs (@refSeqIDs){
        my $factory = Bio::DB::EUtilities->new(-eutil   => 'efetch',-db=> 'protein',-  rettype => 'gb',
                                           -email   => 'x@y.com',-id=> $refSeqIDs);
        my $rawfile = "$refSeqIDs.gbk";
        $factory->get_Response(-file =>"$refSeqIDs.gbk");
        my $seqio_object = Bio::SeqIO->new(-format=>"Genbank",-file =>"$refSeqIDs.gbk");
        while ( my $seq_object=$seqio_object->next_seq){
            $sequence=$seq_object->seq;
            print ("$sequence\n");
        }
    }

1 个答案:

答案 0 :(得分:0)

您的$sequence变量为空,因为这些genbank记录中没有序列。如果您只想下载这些ID的完整基因组序列,只需指定您想要一个fasta而不是genbank记录。

#!/usr/bin/env perl

use strict;
use warnings;
use Bio::DB::EUtilities;


my @refSeqIDs = qw(NC_000915.1 NC_017379.1 NC_017371.1 NC_017354.1);

my $factory = Bio::DB::EUtilities->new(-eutil   => 'efetch', 
                                       -db      => 'nucleotide', 
                                       -rettype => 'fasta',
                                       -email   => 'x@y.com',
                                       -id      => \@refSeqIDs);

print $factory->get_Response->content;

如果您想要不同的内容,请说明您要提取的内容。此外,最好始终将use strict;use warnings;放在脚本的顶部,以便诊断这些类型的消息。