以前有人有这个问题吗?有关原因的任何建议吗?
该脚本创建包含基因组序列的文件,但它出现在过程结束时。
我的剧本中的行
File "scripts/list_ncbi_download_genome_vs_02.py", line 97, in <module>
SeqIO.write(SeqIO.parse(genbank_file, "genbank"), genome_file, "fasta")
出现警告:
File "/usr/lib/python2.7/dist-packages/Bio/SeqIO/__init__.py", line 481, in write
count = writer_class(fp).write_file(sequences)
File "/usr/lib/python2.7/dist-packages/Bio/SeqIO/Interfaces.py", line 209, in write_file
count = self.write_records(records)
File "/usr/lib/python2.7/dist-packages/Bio/SeqIO/Interfaces.py", line 193, in write_records
for record in records:
File "/usr/lib/python2.7/dist-packages/Bio/SeqIO/__init__.py", line 600, in parse
for r in i:
File "/usr/lib/python2.7/dist-packages/Bio/GenBank/Scanner.py", line 478, in parse_records
record = self.parse(handle, do_features)
File "/usr/lib/python2.7/dist-packages/Bio/GenBank/Scanner.py", line 462, in parse
if self.feed(handle, consumer, do_features):
File "/usr/lib/python2.7/dist-packages/Bio/GenBank/Scanner.py", line 434, in feed
self._feed_feature_table(consumer, self.parse_features(skip=False))
File "/usr/lib/python2.7/dist-packages/Bio/GenBank/Scanner.py", line 159, in parse_features
raise ValueError("Premature end of line during features table")
我可以忍受这个,但是完成一个过程并不是那么美好,它会出现在它之后。
该文件可在https://github.com/felipelira/files_to_test/blob/master/GCF_000302915.1_Pav631_1.0_genomic.gbff
下载我脚本中调用命令的块是:
## rename and move files to the output directory created in the command line:
genome_dict = {}
genome_list = []
for genbank_file in list_uncompressed:
organism = genbank_file.split('/')[0]
file_name = genbank_file.split('/')[-1]
genome_file = organism +'_'+ file_name.split('_')[0] +'_'+ file_name.split('_')[1]+'.fna'
genome_list.append(genome_file)
genome_dict[genome_file.replace('.fna', '')] = organism
#print genome_dict
print "Dealing with GenBank record %s" % genome_file
SeqIO.write(SeqIO.parse(genbank_file, "genbank"), os.path.join(outdir, genome_file), "fasta")
print "Genome saved %s" % genome_file
答案 0 :(得分:0)
根据biostars.org https://www.biostars.org/p/289314/#289407
上的帖子中的建议解决了问题Philipp Bayer的建议: https://www.biostars.org/u/4678/
通常这应该有效(并且它在我的系统上)。你之前在脚本中写了genbank_file吗?也许你还没有 关闭文件句柄,以便写入文件尚未同步?
和a.zielezinski: https://www.biostars.org/u/4700/ 来自Bio import SeqIO
l = ['GCF_000302915.1_Pav631_1.0_genomic.gbff']
for genbank_file in l:
fh = open(genbank_file)
oh = open(genbank_file + '.fasta', 'w')
for seq_record in SeqIO.parse(fh, 'genbank'):
oh.write(seq_record.format('fasta'))
oh.close()
fh.close()