在python中解析xml文件时获取NoneType错误

时间:2014-11-04 03:59:25

标签: python nltk

我有这种xml文件:

<dep type="nsubj">
            <governor idx="7">open</governor>
            <dependent idx="5">it</dependent>
          </dep>
          <dep type="aux">
            <governor idx="7">open</governor>
            <dependent idx="6">will</dependent>
          </dep>
          <dep type="ccomp">
            <governor idx="3">announced</governor>
            <dependent idx="7">open</dependent>
          </dep>

我想解析它并提取深层类型,例如nsubj,aux,ccomp等。我这样做:

file_list=[]
with open(xml_file) as f:
    page = f.read()
f.close()
soup = BeautifulSoup(page,"xml")
for types in soup.find_all('dep'):
    file_list.append(types.string.strip())
print file_list

但是,我收到了NoneType错误。为什么会这样?

编辑:

回溯:

Traceback (most recent call last):
  File "/Users/akritibahal/Downloads/stanford-corenlp-2012-07-09/testing.py", line 103, in <module>
    main()
  File "/Users/akritibahal/Downloads/stanford-corenlp-2012-07-09/testing.py", line 102, in main
    extract_top_dependencies('/Users/akritibahal/Downloads/stanford-corenlp-2012-07-09/test')
  File "/Users/akritibahal/Downloads/stanford-corenlp-2012-07-09/testing.py", line 80, in extract_top_dependencies
    file_list.append(types.string.strip())
AttributeError: 'NoneType' object has no attribute 'strip'

EDIT2:

我认为我的做法是如何进行xml解析,它是在&lt;&gt;之间读取的。这些标签。但对于dep,我想提取type =中的内容,而open和close标签之间没有任何内容。怎么做?

2 个答案:

答案 0 :(得分:0)

删除

f.close()

线!它在使用with open()语法时自动完成,名称f仅在with块内有效。

答案 1 :(得分:0)

根据您的编辑(以及原始types语句中的名称for),您似乎在标记属性之后而不是字符串。要访问标记属性,请尝试使用以下行:

>>> xml = """<root><dep type="nsubj">
            <governor idx="7">open</governor>
            <dependent idx="5">it</dependent>
          </dep>
          <dep type="aux">
            <governor idx="7">open</governor>
            <dependent idx="6">will</dependent>
          </dep>
          <dep type="ccomp">
            <governor idx="3">announced</governor>
            <dependent idx="7">open</dependent>
          </dep></root>"""
>>> soup = BeautifulSoup(xml)
>>> for dep in soup.find_all('dep'):
    print dep.attrs.get('type')

nsubj
aux
ccomp

换句话说,我认为你想要这样的东西:

>>> for dep_elem in soup.find_all('dep'):
        type_ = dep_elem.attrs.get('type')
        if type_:  # be sure type_ is not a NoneType
            file_list.append(type_.strip())

请参阅文档here