Question

原帖似乎太模糊了，所以我缩小了这篇文章的重点。我有一个XML文件，我想从中提取特定分支的值，我很难理解如何有效地导航XML路径。考虑下面的XML文件。有几个<mi>个分支。我想存储某些分支的<r>值，但不存储其他分支。在这个例子中，我想要counter1和counter3的<r>值，但不是counter2。

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="Data.xsl" ?>
<!DOCTYPE mdc SYSTEM "Data.dtd">
<mdc xmlns:HTML="http://www.w3.org/TR/REC-xml">
<mfh>
<vn>TEST</vn>
<cbt>20140126234500.0+0000</cbt>
</mfh>
<mi>
    <mts>20140126235000.0+0000</mts>
    <mt>counter1</mt>
    <mv>
        <moid>DEFAULT</moid>
        <r>58</r>
    </mv>
</mi>
<mi>
    <mts>20140126235000.0+0000</mts>
    <mt>counter2</mt>
    <mv>
        <moid>DEFAULT</moid>
        <r>100</r>
    </mv>
</mi>
<mi>
    <mts>20140126235000.0+0000</mts>
    <mt>counter3</mt>
    <mv>
        <moid>DEFAULT</moid>
        <r>7</r>
    </mv>
</mi>
</mdc>

从那开始我想用以下内容构建一个元组：（＆＃39; 20140126234500.0 + 0000＆＃39;，58,7）其中20140126234500.0 + 0000来自<cbt>，其中58来自<r>元素的<mi>值<mt>counter1</mt>，而<mi>元素来自<mt>counter3</mt>具有xml.etree.cElementTree。

的元素

我想使用try: import xml.etree.cElementTree as ET except ImportError: import xml.etree.ElementTree as ET tree = ET.ElementTree(file='Data.xml') root = tree.getroot() for mi in root.iter('mi'): print(mi.tag) for mt in mi.findall("./mt") if mt.value == 'counter1': print(mi.find("./mv/r").value) #I know this is invalid syntax, but it's what I want to do :)，因为它似乎是标准的，应该超出我的目的。但是我在导航树和提取我需要的值时遇到了困难。以下是我尝试过的一些内容。

find the <cbt> value and store it in the first position of the tuple.
find the <mi> element where <mt>counter1</mt> exists and store the <r> value in the second position of the tuple.
find the <mi> element where <mt>counter3</mt> exists and store the <r> value in the third position of the tuple.

从伪代码的角度来看，我想要做的是：

element.iter()

我不清楚何时使用element.findall()或XPath。另外，我在功能中使用{{1}}或者能够提取我需要的信息时运气不佳。

谢谢，生锈的

Answer 1

从：

开始

import xml.etree.cElementTree as ET  # or with try/except as per your edit

xml_data1 = """<?xml version="1.0"?> and the rest of your XML here"""
tree = ET.fromstring(xml_data)  # or `ET.parse(<filename>)`
xml_dict = {}

现在tree有xml树，而xml_dict将是您尝试获取结果的词典。

# first get the key & val for 'cbt'
cbt_val = tree.find('mfh').find('cbt').text
xml_dict['cbt'] = cbt_val

计数器位于'mi'：

for elem in tree.findall('mi'):
    counter_name = elem.find('mt').text            # key
    counter_val = elem.find('mv').find('r').text   # value
    xml_dict[counter_name] = counter_val

此时，xml_dict是：

>>> xml_dict
{'counter2': '100', 'counter1': '58', 'cbt': '20140126234500.0+0000', 'counter3': '7'}

一些缩短，但可能不是可读的：for elem in tree.findall('mi'):循环中的代码可以是：

xml_dict[elem.find('mt').text] = elem.find('mv').find('r').text
# that combines the key/value extraction to one line

或者，构建xml_dict只需要两行，首先是计数器，然后是cbt：

xml_dict = {elem.find('mt').text: elem.find('mv').find('r').text for elem in tree.findall('mi')}
xml_dict['cbt'] = tree.find('mfh').find('cbt').text

编辑：

From the docs，Element.findall()只查找标记为当前元素的直接子元素的元素。

find()只找到第一个直接孩子。

iter()以递归方式迭代所有元素。

Python3将XML解析为字典

1 个答案: