打印字典会产生倍数

时间:2016-12-02 23:49:24

标签: python dictionary

我已经问过有关此计划的问题here

我目前正在运行的代码是

import re


out = open("parse_go.txt", "w")

id_to_info = {} #declare dictionary

def parse_record(term):
    go_id = re.findall(r"id:\s(.*?)\n", term, re.DOTALL)[0]
    name = re.findall(r"name:\s(.*?)\n", term, re.DOTALL)[0]
    namespace = re.findall(r"namespace:\s(.*?)\n", term, re.DOTALL)[0]
    is_a = re.findall(r"is_a:\s(.*?)\n", term, re.DOTALL)
    is_a = "\n\t".join(is_a)
    info = namespace + "\n" + "\t" + name + "\n" + "\t" +  is_a
    id_to_info[go_id] = info
    for go_id, info in id_to_info.items():
        out.write(go_id + "\t" + info + "\n\n")
    # for go_id in id_to_info:
    #    out.write(go_id + "\t" + info + "\n\n")        


def split_record(record):
    sp_file = open(record)
    sp_records = sp_file.read()
    sp_split_records = re.findall(r"(\[.*?)\n\n", sp_records, re.DOTALL)
    for sp_record in sp_split_records:
        parse_record(term=sp_record)
    sp_file.close()

split_record(record="/scratch/go-basic.obo")

但是从输出文件的开头我可以看到,我得到了相同结果的多个打印输出。

GO:0000001      biological_process
        mitochondrion inheritance
        GO:0048308 ! organelle inheritance

GO:0000002      biological_process
        mitochondrial genome maintenance


GO:0000001      biological_process
        mitochondrion inheritance
        GO:0048308 ! organelle inheritance

GO:0000002      biological_process
        mitochondrial genome maintenance


GO:0000003      biological_process
        reproduction


GO:0000001      biological_process
        mitochondrion inheritance
        GO:0048308 ! organelle inheritance

GO:0000005      molecular_function
        ribosomal chaperone activity


GO:0000002      biological_process
        mitochondrial genome maintenance


GO:0000003      biological_process
        reproduction


GO:0000001      biological_process
        mitochondrion inheritance
        GO:0048308 ! organelle inheritance

输入文件的开头如下,但它是一个非常大的文件,需要很长时间才能运行

[Term]
id: GO:0000001
name: mitochondrion inheritance
namespace: biological_process
def: "The distribution of mitochondria, including the mitochondrial genome, into daughter cells after mitosis or meiosis, mediated by interactions between mitochondria and the cytoskeleton." [GOC:mcc, PMID:10873824, PMID:11389764]
synonym: "mitochondrial inheritance" EXACT []
is_a: GO:0048308 ! organelle inheritance
is_a: GO:0048311 ! mitochondrion distribution

[Term]
id: GO:0000002
name: mitochondrial genome maintenance
namespace: biological_process
def: "The maintenance of the structure and integrity of the mitochondrial genome; includes replication and segregation of the mitochondrial chromosome." [GOC:ai, GOC:vw]
is_a: GO:0007005 ! mitochondrion organization

[Term]
id: GO:0000003
name: reproduction
namespace: biological_process
alt_id: GO:0019952
alt_id: GO:0050876
def: "The production of new individuals that contain some portion of genetic material inherited from one or more parent organisms." [GOC:go_curators, GOC:isa_complete, GOC:jl, ISBN:0198506732]
subset: goslim_generic
subset: goslim_pir
subset: goslim_plant
subset: gosubset_prok
synonym: "reproductive physiological process" EXACT [] 
xref: Wikipedia:Reproduction
is_a: GO:0008150 ! biological_process

[Term]
id: GO:0000005
name: ribosomal chaperone activity
namespace: molecular_function
def: "OBSOLETE. Assists in the correct assembly of ribosomes or ribosomal subunits in vivo, but is not a component of the assembled ribosome when performing its normal biological function." [GOC:jl, PMID:12150913]
comment: This term was made obsolete because it refers to a class of gene products and a biological process rather than a molecular function.
is_obsolete: true
consider: GO:0042254
consider: GO:0044183
consider: GO:0051082

我知道字典不会按数字顺序排列,但是我想知道是否会出现这样的多个打印件,或者是因为编码中出现了一些错误?

0 个答案:

没有答案