改善genbank功能的添加

时间:2015-07-08 11:09:31

标签: performance biopython genbank

我正在尝试使用biopython向genbank文件添加超过70000个新功能。

我有这段代码:

from Bio import SeqIO
from Bio.SeqFeature import SeqFeature, FeatureLocation

fi = "myoriginal.gbk"
fo = "mynewfile.gbk"

for result in results:
     start = 0
     end = 0

     result = result.split("\t")
     start = int(result[0])
     end = int(result[1])

     for record in SeqIO.parse(original, "gb"):
         record.features.append(SeqFeature(FeatureLocation(start, end), type = "misc_feat"))
         SeqIO.write(record, fo, "gb")

结果只是一个列表,其中包含我需要添加到原始gbk文件的每个功能的开头和结尾。

此解决方案对我的计算机来说成本极高,而且我不知道如何提高性能。有什么好主意吗?

1 个答案:

答案 0 :(得分:1)

您应该只解析一次genbank文件。省略results包含的内容(我不确切知道,因为您的示例中有一些代码丢失),我猜这样会改善性能,修改代码:

fi = "myoriginal.gbk"
fo = "mynewfile.gbk"

original_records = list(SeqIO.parse(fi, "gb"))

for result in results:
    result = result.split("\t")
    start = int(result[0])
    end = int(result[1])

    for record in original_records:
        record.features.append(SeqFeature(FeatureLocation(start, end), type = "misc_feat"))
        SeqIO.write(record, fo, "gb")