用ElementTree解析空标签

时间:2018-07-31 02:09:51

标签: python xml-parsing elementtree

尝试使用ElementTree解析XML。我无法弄清楚如何处理<tag/>之类的空标签。如果标记根本不存在,则.find()返回None,一切正常。但是,对于<tag\>.find()返回了一些信息,因此随后调用text的尝试失败并显示错误:

TypeError: must be str, not NoneType

下面的失败示例。它将无法解析行<tl><mpa/></tl>

from xml.etree import ElementTree

def getStuff(xml_message):
    message_tree = ElementTree.fromstring(xml_message)
    ns = {'a': 'http://www.example.org/a',
          'b': 'http://www.example.org/b'}          
    tls = message_tree.findall('.//b:tl', namespaces = ns)

    result, i = (0,)*2

    for tl in tls:
        i += 1     
        print("Item: " + str(i))
        mpa = tl.find("b:mpa", namespaces = ns)
        if mpa is None:
            result = result + 0
            print(" |--> Is None, assigned 0.")
        else:
            print(" |--> Is Something")
            # This is where things go terribly wrong
            print(" |--> Tag Value: " + mpa.text)
            result = result  + int(mpa.text)    
    return result

instr = """<?xml version="1.0" standalone='no'?>
<ncr xmlns="http://www.example.org/a">
  <x xmlns="http://www.example.org/b">
      <tl><ec code="N">e1</ec></tl>
      <tl><mpa>0010</mpa></tl>
      <tl><mpa/></tl>
  </x>
</ncr>
"""
getStuff(instr)

1 个答案:

答案 0 :(得分:1)

使用空标签<mpa/>,您的mpa变量是有效节点,因此不是None,但是mpa.textNone,因为没有文本内。由于串联仅适用于两个字符串,因此您尝试将字符串" |--> Tag Value: "连接到None的尝试失败。相反,您可以使用格式运算符将None格式化为'None',并在以下行中添加条件,以避免将mpa.text转换为整数,如果它是None

print(" |--> Tag Value: %s" % mpa.text)
if mpa.text is not None:
    result = result  + int(mpa.text)

进行上述更改后,输出将变为:

Item: 1
 |--> Is None, assigned 0.
Item: 2
 |--> Is Something
 |--> Tag Value: 0010
Item: 3
 |--> Is Something
 |--> Tag Value: None