使用python解析XML文件时无法获取所有属性

时间:2015-06-30 19:16:41

标签: python xml parsing attributes

这是XML文件' test_xml2.xml'

<feed xml:lang='en'>
  <title>HackerRank</title>
  <subtitle lang='en'>Programming challenges</subtitle>
  <link rel='alternate' type='text/html' href='http://hackerrank.com/'/>
  <updated>2013-12-25T12:00:00</updated>
  <entry>
    <author gender='male'>Harsh</author>
    <question type='hard'>XML 1</question>
    <description type='text'>This is related to XML parsing</description>
  </entry>
</feed>

它实际上有8个属性。

但是我的代码

import xml.etree.ElementTree as etree

count = 0
xml = 'test_xml2.xml'
tree = etree.parse(xml)
root = tree.getroot()
for item in root:
    count += len(item.attrib)
    print item.keys()
print count

我得到了结果&#39; 4&#39;。

  

有人可以告诉我出了什么问题吗?

3 个答案:

答案 0 :(得分:1)

This loop:

for item in root:
    count += len(item.attrib)

iterates over the immediate children of root, not the grandchildren or deeper descendents.

Perhaps this will help:

for item in root.iter():
    count += len(item.attrib)

答案 1 :(得分:0)

The items in root are the title, subtitle, link, updated and entry nodes; subtitle has 1 attribute (lang) and link has 3 (rel, type and href): 4 attributes.

Your code needs to dive into the items in the items of root (entry, specifically).

答案 2 :(得分:0)

When you perform the loop for item in root: it only iterates over the immediate children of root and not its descendants.

One way to meet your requirement would be to use the xpath - .//* to get all elements in the xml (as a list) and then iterate over that to get the list of attributes.

Please note, the xpath - .//* - will not return the root itself, so count needs to be initialized with length of root's attrib.

Example -

>>> count = len(root.attrib)
>>> elements = root.findall(".//*")
>>> for item in elements:
...     count += len(item.attrib)
...     print(item.keys())
[]
['lang']
['href', 'type', 'rel']
[]
[]
['gender']
['type']
['type']
>>> print(count)
8