我应该首先说我和Python和Biopython一样新。我正在尝试将一个大的.fasta文件(包含多个条目)拆分为单个文件,每个文件都有一个条目。我在Biopython wiki / Cookbook网站上找到了以下大部分代码,并对其进行了一些修改。我的问题是这个生成器将它们命名为“1.fasta”,“2.fasta”等,我需要用一些标识符命名,例如GI编号。
def batch_iterator(iterator, batch_size) :
"""Returns lists of length batch_size.
This can be used on any iterator, for example to batch up
SeqRecord objects from Bio.SeqIO.parse(...), or to batch
Alignment objects from Bio.AlignIO.parse(...), or simply
lines from a file handle.
This is a generator function, and it returns lists of the
entries from the supplied iterator. Each list will have
batch_size entries, although the final list may be shorter.
"""
entry = True #Make sure we loop once
while entry :
batch = []
while len(batch) < batch_size :
try :
entry = next(iterator)
except StopIteration :
entry = None
if entry is None :
#End of file
break
batch.append(entry)
if batch :
yield batch
from Bio import SeqIO
infile = input('Which .fasta file would you like to open? ')
record_iter = SeqIO.parse(open(infile), "fasta")
for i, batch in enumerate(batch_iterator(record_iter, 1)) :
outfile = "c:\python32\myfiles\%i.fasta" % (i+1)
handle = open(outfile, "w")
count = SeqIO.write(batch, handle, "fasta")
handle.close()
如果我尝试替换:
outfile = "c:\python32\myfiles\%i.fasta" % (i+1)
使用:
outfile = "c:\python32\myfiles\%s.fasta" % (record_iter.id)
因此它会在SeqIO中命名类似于seq_record.id的内容,它会出现以下错误:
Traceback (most recent call last):
File "C:\Python32\myscripts\generator.py", line 33, in [HTML]
outfile = "c:\python32\myfiles\%s.fasta" % (record_iter.id)
AttributeError: 'generator' object has no attribute 'id'
虽然生成器函数没有属性'id',但我可以以某种方式解决这个问题吗?这个脚本对于我正在尝试做的事情来说太复杂了吗?!?谢谢,查尔斯
答案 0 :(得分:2)
因为您一次只需要一条记录,所以可以抛弃batch_iterator包装器和枚举:
for seq_record in record_iter:
然后你想要的是每条记录的id属性,而不是整个迭代器:
for seq_record in record_iter:
outfile = "c:\python32\myfiles\{0}.fasta".format(seq_record.id)
handle = open(outfile, "w")
count = SeqIO.write(seq_record, handle, "fasta")
handle.close()
作为参考,生成器错误是由于您尝试从对象id
获取属性record_iter
。 record_iter
不是单个记录,而是一组记录,它们作为Python生成器保存,有点像正在进行的列表,因此您不必读取整个文件立即使用,内存使用效率更高。有关生成器的更多信息:What can you use Python generator functions for?,http://docs.python.org/tutorial/classes.html#generators,