我正在尝试使用biopython向genbank文件添加超过70000个新功能。
我有这段代码:
from Bio import SeqIO
from Bio.SeqFeature import SeqFeature, FeatureLocation
fi = "myoriginal.gbk"
fo = "mynewfile.gbk"
for result in results:
start = 0
end = 0
result = result.split("\t")
start = int(result[0])
end = int(result[1])
for record in SeqIO.parse(original, "gb"):
record.features.append(SeqFeature(FeatureLocation(start, end), type = "misc_feat"))
SeqIO.write(record, fo, "gb")
结果只是一个列表,其中包含我需要添加到原始gbk文件的每个功能的开头和结尾。
此解决方案对我的计算机来说成本极高,而且我不知道如何提高性能。有什么好主意吗?
答案 0 :(得分:1)
您应该只解析一次genbank文件。省略results
包含的内容(我不确切知道,因为您的示例中有一些代码丢失),我猜这样会改善性能,修改代码:
fi = "myoriginal.gbk"
fo = "mynewfile.gbk"
original_records = list(SeqIO.parse(fi, "gb"))
for result in results:
result = result.split("\t")
start = int(result[0])
end = int(result[1])
for record in original_records:
record.features.append(SeqFeature(FeatureLocation(start, end), type = "misc_feat"))
SeqIO.write(record, fo, "gb")