我有这种xml文件:
<dep type="nsubj">
<governor idx="7">open</governor>
<dependent idx="5">it</dependent>
</dep>
<dep type="aux">
<governor idx="7">open</governor>
<dependent idx="6">will</dependent>
</dep>
<dep type="ccomp">
<governor idx="3">announced</governor>
<dependent idx="7">open</dependent>
</dep>
我想解析它并提取深层类型,例如nsubj,aux,ccomp等。我这样做:
file_list=[]
with open(xml_file) as f:
page = f.read()
f.close()
soup = BeautifulSoup(page,"xml")
for types in soup.find_all('dep'):
file_list.append(types.string.strip())
print file_list
但是,我收到了NoneType错误。为什么会这样?
编辑:
回溯:
Traceback (most recent call last):
File "/Users/akritibahal/Downloads/stanford-corenlp-2012-07-09/testing.py", line 103, in <module>
main()
File "/Users/akritibahal/Downloads/stanford-corenlp-2012-07-09/testing.py", line 102, in main
extract_top_dependencies('/Users/akritibahal/Downloads/stanford-corenlp-2012-07-09/test')
File "/Users/akritibahal/Downloads/stanford-corenlp-2012-07-09/testing.py", line 80, in extract_top_dependencies
file_list.append(types.string.strip())
AttributeError: 'NoneType' object has no attribute 'strip'
EDIT2:
我认为我的做法是如何进行xml解析,它是在&lt;&gt;之间读取的。这些标签。但对于dep,我想提取type =中的内容,而open和close标签之间没有任何内容。怎么做?
答案 0 :(得分:0)
删除
f.close()
线!它在使用with open()
语法时自动完成,名称f
仅在with块内有效。
答案 1 :(得分:0)
根据您的编辑(以及原始types
语句中的名称for
),您似乎在标记属性之后而不是字符串。要访问标记属性,请尝试使用以下行:
>>> xml = """<root><dep type="nsubj">
<governor idx="7">open</governor>
<dependent idx="5">it</dependent>
</dep>
<dep type="aux">
<governor idx="7">open</governor>
<dependent idx="6">will</dependent>
</dep>
<dep type="ccomp">
<governor idx="3">announced</governor>
<dependent idx="7">open</dependent>
</dep></root>"""
>>> soup = BeautifulSoup(xml)
>>> for dep in soup.find_all('dep'):
print dep.attrs.get('type')
nsubj
aux
ccomp
换句话说,我认为你想要这样的东西:
>>> for dep_elem in soup.find_all('dep'):
type_ = dep_elem.attrs.get('type')
if type_: # be sure type_ is not a NoneType
file_list.append(type_.strip())
请参阅文档here。