Question

我有这种xml文件：

<dep type="nsubj">
            <governor idx="7">open</governor>
            <dependent idx="5">it</dependent>
          </dep>
          <dep type="aux">
            <governor idx="7">open</governor>
            <dependent idx="6">will</dependent>
          </dep>
          <dep type="ccomp">
            <governor idx="3">announced</governor>
            <dependent idx="7">open</dependent>
          </dep>

我想解析它并提取深层类型，例如nsubj，aux，ccomp等。我这样做：

file_list=[]
with open(xml_file) as f:
    page = f.read()
f.close()
soup = BeautifulSoup(page,"xml")
for types in soup.find_all('dep'):
    file_list.append(types.string.strip())
print file_list

但是，我收到了NoneType错误。为什么会这样？

编辑：

回溯：

Traceback (most recent call last):
  File "/Users/akritibahal/Downloads/stanford-corenlp-2012-07-09/testing.py", line 103, in <module>
    main()
  File "/Users/akritibahal/Downloads/stanford-corenlp-2012-07-09/testing.py", line 102, in main
    extract_top_dependencies('/Users/akritibahal/Downloads/stanford-corenlp-2012-07-09/test')
  File "/Users/akritibahal/Downloads/stanford-corenlp-2012-07-09/testing.py", line 80, in extract_top_dependencies
    file_list.append(types.string.strip())
AttributeError: 'NoneType' object has no attribute 'strip'

EDIT2：

我认为我的做法是如何进行xml解析，它是在＆lt;＆gt;之间读取的。这些标签。但对于dep，我想提取type =中的内容，而open和close标签之间没有任何内容。怎么做？

Answer 1

删除

f.close()

线！它在使用with open()语法时自动完成，名称f仅在with块内有效。

Answer 2

根据您的编辑（以及原始types语句中的名称for），您似乎在标记属性之后而不是字符串。要访问标记属性，请尝试使用以下行：

>>> xml = """<root><dep type="nsubj">
            <governor idx="7">open</governor>
            <dependent idx="5">it</dependent>
          </dep>
          <dep type="aux">
            <governor idx="7">open</governor>
            <dependent idx="6">will</dependent>
          </dep>
          <dep type="ccomp">
            <governor idx="3">announced</governor>
            <dependent idx="7">open</dependent>
          </dep></root>"""
>>> soup = BeautifulSoup(xml)
>>> for dep in soup.find_all('dep'):
    print dep.attrs.get('type')

nsubj
aux
ccomp

换句话说，我认为你想要这样的东西：

>>> for dep_elem in soup.find_all('dep'):
        type_ = dep_elem.attrs.get('type')
        if type_:  # be sure type_ is not a NoneType
            file_list.append(type_.strip())

请参阅文档here。

在python中解析xml文件时获取NoneType错误

2 个答案: