SeqIO.parse python:功能表中的过早行结束

时间:2017-12-13 11:45:49

标签: python parsing bioinformatics

以前有人有这个问题吗?有关原因的任何建议吗?

该脚本创建包含基因组序列的文件,但它出现在过程结束时。

我的剧本中的行

File "scripts/list_ncbi_download_genome_vs_02.py", line 97, in <module>
    SeqIO.write(SeqIO.parse(genbank_file, "genbank"), genome_file, "fasta")

出现警告:

  File "/usr/lib/python2.7/dist-packages/Bio/SeqIO/__init__.py", line 481, in write
    count = writer_class(fp).write_file(sequences)
  File "/usr/lib/python2.7/dist-packages/Bio/SeqIO/Interfaces.py", line 209, in write_file
    count = self.write_records(records)
  File "/usr/lib/python2.7/dist-packages/Bio/SeqIO/Interfaces.py", line 193, in write_records
    for record in records:
  File "/usr/lib/python2.7/dist-packages/Bio/SeqIO/__init__.py", line 600, in parse
    for r in i:
  File "/usr/lib/python2.7/dist-packages/Bio/GenBank/Scanner.py", line 478, in parse_records
    record = self.parse(handle, do_features)
  File "/usr/lib/python2.7/dist-packages/Bio/GenBank/Scanner.py", line 462, in parse
    if self.feed(handle, consumer, do_features):
  File "/usr/lib/python2.7/dist-packages/Bio/GenBank/Scanner.py", line 434, in feed
    self._feed_feature_table(consumer, self.parse_features(skip=False))
  File "/usr/lib/python2.7/dist-packages/Bio/GenBank/Scanner.py", line 159, in parse_features
    raise ValueError("Premature end of line during features table")

我可以忍受这个,但是完成一个过程并不是那么美好,它会出现在它之后。

该文件可在https://github.com/felipelira/files_to_test/blob/master/GCF_000302915.1_Pav631_1.0_genomic.gbff

下载

我脚本中调用命令的块是:

## rename and move files to the output directory created in the command line:
genome_dict = {}
genome_list = []
for genbank_file in list_uncompressed:
    organism = genbank_file.split('/')[0]
    file_name = genbank_file.split('/')[-1]
    genome_file = organism +'_'+ file_name.split('_')[0] +'_'+ file_name.split('_')[1]+'.fna'
    genome_list.append(genome_file)
    genome_dict[genome_file.replace('.fna', '')] = organism
#print genome_dict
    print "Dealing with GenBank record %s" % genome_file
    SeqIO.write(SeqIO.parse(genbank_file, "genbank"), os.path.join(outdir, genome_file), "fasta")
    print "Genome saved %s" % genome_file

1 个答案:

答案 0 :(得分:0)

根据biostars.org https://www.biostars.org/p/289314/#289407

上的帖子中的建议解决了问题

Philipp Bayer的建议: https://www.biostars.org/u/4678/

  

通常这应该有效(并且它在我的系统上)。你之前在脚本中写了genbank_file吗?也许你还没有   关闭文件句柄,以便写入文件尚未同步?

和a.zielezinski: https://www.biostars.org/u/4700/ 来自Bio import SeqIO

l = ['GCF_000302915.1_Pav631_1.0_genomic.gbff']
for genbank_file in l:
    fh = open(genbank_file)
    oh = open(genbank_file + '.fasta', 'w')
    for seq_record in SeqIO.parse(fh, 'genbank'):
        oh.write(seq_record.format('fasta'))
    oh.close()
    fh.close()