我是python的新手,正在从文件构建字典,然后迭代字典。我一直在eclipse工作,并没有得到任何输出,甚至没有任何警告。
输入看起来像这样(实际输入明显更大)
[Term]
id: GO:0000010
name: trans-hexaprenyltranstransferase activity
namespace: molecular_function
def: "Catalysis of the reaction: all-trans-hexaprenyl diphosphate + isopentenyl diphosphate = all-trans-heptaprenyl diphosphate + diphosphate." [KEGG:R05612, RHEA:20839]
subset: gosubset_prok
xref: KEGG:R05612
xref: RHEA:20839
is_a: GO:0016765 ! transferase activity, transferring alkyl or aryl (other than methyl) groups
[Term]
id: GO:0000011
name: vacuole inheritance
namespace: biological_process
def: "The distribution of vacuoles into daughter cells after mitosis or meiosis, mediated by interactions between vacuoles and the cytoskeleton." [GOC:mcc, PMID:10873824, PMID:14616069]
is_a: GO:0007033 ! vacuole organization
is_a: GO:0048308 ! organelle inheritance
[Term]
id: GO:0000012
name: single strand break repair
namespace: biological_process
def: "The repair of single strand breaks in DNA. Repair of such breaks is mediated by the same enzyme systems as are used in base excision repair." [http://www.ultranet.com/~jkimball/BiologyPages/D/DNArepair.html]
subset: gosubset_prok
is_a: GO:0006281 ! DNA repair
[Term]
id: GO:0000014
name: single-stranded DNA endodeoxyribonuclease activity
namespace: molecular_function
def: "Catalysis of the hydrolysis of ester linkages within a single-stranded deoxyribonucleic acid molecule by creating internal breaks." [GOC:mah]
synonym: "single-stranded DNA specific endodeoxyribonuclease activity" RELATED []
synonym: "ssDNA-specific endodeoxyribonuclease activity" RELATED [GOC:mah]
is_a: GO:0004520 ! endodeoxyribonuclease activity
我想要产生的输出是
GO:0000010 molecular_function
trans-hexaprenyltranstransferase activity
GO:0016765 ! transferase activity, transferring alkyl or aryl (other than methyl) groups
GO:0000011 biological_process
vacuole inheritance
is_a: GO:0007033 ! vacuole organization
is_a: GO:0048308 ! organelle inheritance
GO:0000012 biological_process
single strand break repair
is_a: GO:0006281 ! DNA repair
GO:0000014 molecular_function
single-stranded DNA endodeoxyribonuclease activity
is_a: GO:0004520 ! endodeoxyribonuclease activity
我的代码是:
import re
id_to_info = {} #declare dictionary
def parse_record(term):
go_id = re.findall(r"id:\s(.*?)\n", term, re.DOTALL)
name = re.findall(r"name:\s(.*?)\n", term, re.DOTALL)
namespace = re.findall(r"namespace:\s(.*?)\n", term, re.DOTALL)
is_a = re.findall(r"is_a:\s(.*?)\n", term, re.DOTALL)
info = namespace + "\n" + name + "\n" + is_a
id_to_info[go_id] = info
for go_id, info in id_to_info.interitems():
print(go_id + "\t" + info)
def split_record(record):
sp_file = open(record)
sp_records = sp_file.read()
sp_split_records = re.findall(r"(\[.*?)\n\n", sp_records, re.DOTALL)
for sp_record in sp_split_records:
parse_record(term=sp_record)
sp_file.close()
split_record(record="go.rtf")
我真的不知道我哪里出错,但我认为主要问题是我的字典电话?
答案 0 :(得分:2)
import re
id_to_info = {} #declare dictionary
def parse_record(term):
go_id = re.findall(r"id:\s(.*?)\n", term, re.DOTALL)[0]
name = re.findall(r"name:\s(.*?)\n", term, re.DOTALL)[0]
namespace = re.findall(r"namespace:\s(.*?)\n", term, re.DOTALL)[0]
is_a = re.findall(r'is_a:(.*)', term, re.DOTALL)[0]
info = namespace + "\n" + name + "\n" + is_a
id_to_info[go_id] = info
for go_id, info in id_to_info.iteritems():
print(go_id + "\t" + info)
def split_record(record):
sp_file = open(record)
sp_records = sp_file.read()
sp_split_records = re.findall(r"(\[.*?)\n\n", sp_records, re.DOTALL)
for sp_record in sp_split_records:
parse_record(term=sp_record)
sp_file.close()
split_record(record="go.rtf")
我建议 NOT 使用IDE,而不是使用终端或 至少要调试解释器:
Python 2.7.10 (default, Jul 30 2016, 18:31:42)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.34)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> s = """[Term]
... id: GO:0000010
... name: trans-hexaprenyltranstransferase activity
... namespace: molecular_function
... def: "Catalysis of the reaction: all-trans-hexaprenyl diphosphate + isopentenyl diphosphate = all-trans-heptaprenyl diphosphate + diphosphate." [KEGG:R05612, RHEA:20839]
... subset: gosubset_prok
... xref: KEGG:R05612
... xref: RHEA:20839
... is_a: GO:0016765 ! transferase activity, transferring alkyl or aryl (other than methyl) groups"""
>>> import re
>>> re.findall(r'is_a:(.*)', s)
[' GO:0016765 ! transferase activity, transferring alkyl or aryl (other than methyl) groups']
同样需要大量打印,Python是动态的,这意味着它没有编译和运行..它会一直运行直到遇到错误。
你的问题:
1)RegEx - 谷歌周围 2)错字 - iteritems!你们都可以阅读 Python文档。他们真的很好..或者选择任何一本书......你会学到的 在编写代码和试验解释器时很多。
--- Python爱人!
答案 1 :(得分:1)
re.findall 会返回找到的内容列表;你的代码假设字符串。由于每行只有一次点击,只需在可行的情况下添加 [0] 。 is_a 可以回空,所以需要更加温柔的处理。
此外,(键,值)方法是 iteritems (迭代项),而不是 i n teritems 。
以下是更新:
import re
id_to_info = {} #declare dictionary
def parse_record(term):
go_id = re.findall(r"id:\s(.*?)\n", term, re.DOTALL)[0]
name = re.findall(r"name:\s(.*?)\n", term, re.DOTALL)[0]
namespace = re.findall(r"namespace:\s(.*?)\n", term, re.DOTALL)[0]
is_a = re.findall(r"is_a:\s(.*?)\n", term, re.DOTALL)
is_a = is_a[0] if is_a else ""
# print namespace, name, is_a
info = namespace + "\n" + name + "\n" + is_a
id_to_info[go_id] = info
for go_id, info in id_to_info.iteritems():
print(go_id + "\t" + info)
def split_record(record):
sp_file = open(record)
sp_records = sp_file.read()
sp_split_records = re.findall(r"(\[.*?)\n\n", sp_records, re.DOTALL)
for sp_record in sp_split_records:
parse_record(term=sp_record)
sp_file.close()
split_record(record="go.rtf")
输出:
GO:0000010 molecular_function
trans-hexaprenyltranstransferase activity
GO:0016765 ! transferase activity, transferring alkyl or aryl (other
GO:0000011 biological_process
vacuole inheritance
GO:0007033 ! vacuole organization
GO:0000010 molecular_function
trans-hexaprenyltranstransferase activity
GO:0016765 ! transferase activity, transferring alkyl or aryl (other
GO:0000011 biological_process
vacuole inheritance
GO:0007033 ! vacuole organization
GO:0000010 molecular_function
trans-hexaprenyltranstransferase activity
GO:0016765 ! transferase activity, transferring alkyl or aryl (other
GO:0000012 biological_process
single strand break repair
我会把剩下的格式留给你。 : - )