如何在ElementTree中获取root.iter()中的总计数

时间:2012-10-01 07:56:57

标签: python xml-parsing elementtree

我是XML文件,root有10个子子(相同的层次结构),名称为'testCase'

我无法弄清楚以下内容: 首先,我正在做以下事情以获得所有子孩子:

for testCase in root.iter('testCase'):
  1. 我需要从最后一个子子'testCase'中获取一些属性。但我怎么知道它是最后一个'testCase'。有没有办法统计它们?
  2. 另外,有没有办法访问第n个子子而不必通过iter()?

2 个答案:

答案 0 :(得分:3)

尝试以下示例。请参阅下面的输出。它显示了my.xml的内容。该元素表现为子列表(即它也可以迭代)。有一些函数和迭代器可以在文档顺序中独立地获取所有需要元素的位置(即它们的深度,子节点等等并不重要)。 element.attrib表现为属性字典。标准xml.etree.ElementTree也支持XPath的子集 - 请参见最后:

import xml.etree.ElementTree as et

tree = et.parse('my.xml')
root = tree.getroot()     # the root element of the tree

et.dump(root)             # here is how the input file looks inside

# Any element behaves as a list of children. This way, the last child
# of the list can be accessed via negative index.
print '-------------------------------------------'
print root[-1]

# Here is the content.
print '-------------------------------------------'
et.dump(root[-1])

# If the elements could be not direct children, you can use findall('tag') to 
# get the list of the elements. Then you access it again as the last element
# of the list
print '-------------------------------------------'
lst = root.findall('testCase')
et.dump(lst[-1])

# The number of the 'testCase' elements is simply the length of the list.
print '-------------------------------------------'
print 'Num. of test cases:', len(lst)

# The elem.iter('tag') works similarly. But if you want the last element,
# you must know when the element is the last one. It means you have to 
# loop through all of them anyway.
print '-------------------------------------------'
last = None  # init
for e in root.iter('testCase'):
    last = e

et.dump(last)

# The attributes of the elements take the form of the dictinary .attrib.
print '-------------------------------------------'
print last.attrib
print last.attrib['name']

# The standard xml.etree.ElementTree supports a subset of XPath. You can use
# it if you are familiar with XPath.
print '-------------------------------------------'
third = root.find('.//testCase[3]')
et.dump(third)

# ... including the last() function. For more complex cases, use lxml
# as pointed out by Emmanuel.
print '-------------------------------------------'
last = root.find('.//testCase[last()]')
et.dump(last)

它在我的控制台上打印以下内容:

    c:\tmp\___python\Sunny\so12669404>python a.py
<root>
  <testCase name="a" />
  <testCase name="b" />
  <testCase name="c" />
  <testCase name="d" />
</root>
-------------------------------------------
<Element 'testCase' at 0x231a630>
-------------------------------------------
<testCase name="d" />
-------------------------------------------
<testCase name="d" />
-------------------------------------------
Num. of test cases: 4
-------------------------------------------
<testCase name="d" />
-------------------------------------------
{'name': 'd'}
d
-------------------------------------------
<testCase name="c" />

-------------------------------------------
<testCase name="d" />

答案 1 :(得分:2)

关于这种类型的操作,您应该使用XPath,这是浏览XML树的常用且简单的方法。我不认为标准的Python ElementTree支持XPath,但lxml确实(非常常用),这里有一个例子:

得到最后一个孩子:

>>> text = """<Root>
    <Child name="child1" />
    <Child name="child2" />
    <Child name="child3" />
    <Child name="child4" />
    <Child name="child5" />
</Root>"""
>>> from lxml import etree
>>> root = etree.fromstring(text)
>>> last_tag = root.xpath('/Root/Child[last()]')[0]
>>> last_tag.attrib['name']
'child5'

直接访问元素编号#n:

>>> tag3 = root.xpath('/Root/Child[3]')[0]
>>> tag3.attrib['name']
'child3'