Question

我以这种方式检索XML文档：

import xml.etree.ElementTree as ET

root = ET.parse(urllib2.urlopen(url))
for child in root.findall("item"):
  a1 = child[0].text # ok
  a2 = child[1].text # ok
  a3 = child[2].text # ok
  a4 = child[3].text # BOOM
  # ...

XML看起来像这样：

<item>
  <a1>value1</a1>
  <a2>value2</a2>
  <a3>value3</a3>
  <a4>
    <a11>value222</a11>
    <a22>value22</a22>
  </a4>
</item>

如何检查a4（在这种特殊情况下，但可能是其他任何元素）是否有孩子？

Answer 1

您可以尝试元素上的list函数：

>>> xml = """<item>
  <a1>value1</a1>
  <a2>value2</a2>
  <a3>value3</a3>
  <a4>
    <a11>value222</a11>
    <a22>value22</a22>
  </a4>
</item>"""
>>> root = ET.fromstring(xml)
>>> list(root[0])
[]
>>> list(root[3])
[<Element 'a11' at 0x2321e10>, <Element 'a22' at 0x2321e48>]
>>> len(list(root[3]))
2
>>> print "has children" if len(list(root[3])) else "no child"
has children
>>> print "has children" if len(list(root[2])) else "no child"
no child
>>> # Or simpler, without a call to list within len, it also works:
>>> print "has children" if len(root[3]) else "no child"
has children

我修改了您的示例，因为findall根上的item函数调用不起作用（因为findall将搜索直接后代，而不是当前元素）。如果您想在工作程序中访问子孙的文本，您可以这样做：

for child in root.findall("item"):
  # if there are children, get their text content as well.
  if len(child): 
    for subchild in child:
      subchild.text
  # else just get the current child text.
  else:
    child.text

这对于递归来说非常合适。

Answer 2

我能找到的最简单方法是直接使用元素的bool值。这意味着您可以按原样在条件语句中使用a4：

a4 = Element('a4')
if a4:
    print('Has kids')
else:
    print('No kids yet')

a4.append(Element('x'))
if a4:
    print('Has kids now')
else:
    print('Still no kids')

运行此代码将打印

No kids yet
Has kids now

元素的布尔值没有说明text，tail或属性。它只表明儿童的存在与否，这是原始问题所要求的。

Answer 3

元素类具有get children方法。所以你应该使用这样的东西来检查是否有子节点并通过key = tag name将结果存储在字典中：

result = {}
for child in root.findall("item"):
   if child.getchildren() == []:
      result[child.tag] = child.text

Answer 4

我个人建议您使用完全支持xpath表达式的xml解析器。 subset supported by xml.etree不足以完成此类任务。

例如，我可以在lxml中执行：

“给我<item>节点的孩子的所有孩子”：

doc.xpath('//item/*/child::*') #equivalent to '//item/*/*', if you're being terse
Out[18]: [<Element a11 at 0x7f60ec1c1348>, <Element a22 at 0x7f60ec1c1888>]

，或者

“给我所有<item>个没有孩子的孩子”：

doc.xpath('/item/*[count(child::*) = 0]')
Out[20]: 
[<Element a1 at 0x7f60ec1c1588>,
 <Element a2 at 0x7f60ec1c15c8>,
 <Element a3 at 0x7f60ec1c1608>]

，或者

“给我所有没有孩子的元素”：

doc.xpath('//*[count(child::*) = 0]')
Out[29]: 
[<Element a1 at 0x7f60ec1c1588>,
 <Element a2 at 0x7f60ec1c15c8>,
 <Element a3 at 0x7f60ec1c1608>,
 <Element a11 at 0x7f60ec1c1348>,
 <Element a22 at 0x7f60ec1c1888>]

# and if I only care about the text from those nodes...
doc.xpath('//*[count(child::*) = 0]/text()')
Out[30]: ['value1', 'value2', 'value3', 'value222', 'value22']

Answer 5

您可以使用iter方法

import xml.etree.ElementTree as ET

etree = ET.parse('file.xml')
root = etree.getroot()
a = []
for child in root.iter():
    if child.text:
        if len(child.text.split()) > 0:
            a.append(child.text)
print(a)

Answer 6

可以使用一个非常简单的方法

list(<element>)

如果列表为空，则那里没有孩子。

在ElementTree中检查XML元素是否包含子元素

6 个答案: