遍历一系列GenBank基因并将每个基因的特征附加到列表中只会返回最后一个基因

时间:2019-03-12 18:10:53

标签: python list bioinformatics biopython genbank

我的代码有问题。我正在尝试使用BioPython遍历genbank文件的基因列表。看起来是这样的:

class genBank:
    gbProtId = str()
    gbStart = int()
    gbStop = int()
    gbStrand = int()

genBankEntries = list()

for seq_record in SeqIO.parse(genBankFile, "genbank"):
    for seq_feature in seq_record.features:
        genBankEntry = genBank
        if seq_feature.type == "CDS":
            genBankEntry.gbProtId = seq_feature.qualifiers['protein_id']
            genBankEntry.gbStart = seq_feature.location.start # prodigal GFF3 output is 1 based indexing
            genBankEntry.gbStop = seq_feature.location.end 
            genBankEntry.gbStrand = seq_feature.strand
            genBankEntries.append(genBankEntry)

看起来它应该工作,但是当我运行它时,结果结构genBankEntries只是genbank文件中基因数量的巨大堆栈,而seq_record.features中只有最终值作为每个列表元素:

00 = {type} <class '__main__.genBank'>
 gbProtId = {list} ['BAA31840.1']
 gbStart = {ExactPosition} 90649
 gbStop = {ExactPosition} 91648
 gbStrand = {int} 1
...
82 = {type} <class '__main__.genBank'>
 gbProtId = {list} ['BAA31840.1']
 gbStart = {ExactPosition} 90649
 gbStop = {ExactPosition} 91648
 gbStrand = {int} 1

这尤其令人困惑,因为两个for循环似乎都能正常工作:

for seq_record in SeqIO.parse(genBankFile, "genbank"):
    for seq_feature in seq_record.features:
        print(seq_feature)

这是为什么?

0 个答案:

没有答案