Python ElementTree - 用写得不好的XML搜索孩子/孙子

时间:2015-09-28 15:56:15

标签: python xml parsing

我尝试解析编码不良的XML并输出标记的节点名称和内容(仅当它存在时),并且仅当字符串名称=内容> 30天。

到目前为止,我可以使用ElementTree搜索子元素,但是我需要有关嵌套信息不佳的帮助。我无法更改XML,因为它是供应商提供的报告。我是一个完整的新手,所以请指导我做我需要做的事情或提供更好的帮助。提前谢谢。

示例文件:

<?xml version="1.0" encoding="UTF-8"?>
<ReportSection>
    <ReportHead>
        <Criteria>
            <HeadStuff value=Dont Care>
            </HeadStuff>
        </Criteria>
    </ReportHead>
    <ReportBody>
        <ReportSection name="UpTime" category="rule">
            <ReportSection name="NodeName.domain.net" category="node">
                <String name="node">NodeName.domain.net</String>
                <String name="typeName">Windows Server</String>
                <OID>-1y2p0ij32e8c8:-1y2p0idhghwg6</OID>
                <ReportSection name="UpTime" category="element">
                    <ReportSection name="2015-09-20 18:50:10.0" category="version">
                        <String name="version">UpTime</String>
                        <OID>-1y2p0ij32e8cj:-1y2p0ibspofhp</OID>
                        <Integer name="changeType">2</Integer>
                        <String name="changeTypeName">Modified</String>
                        <Timestamp name="changeTime" displayvalue="9/20/15 6:50 PM">1442793010000</Timestamp>
                        <ReportSection name="versionContent" category="versionContent">
                            <String name="content">12 day(s), 7 hour(s), 33 minute(s), 8 second(s)</String>
                            <String name="content"></String>
                        </ReportSection>
                    </ReportSection>
                </ReportSection>
            </ReportSection>
        </ReportSection>
    </ReportBody>
</ReportSection>

1 个答案:

答案 0 :(得分:2)

想法是找到content节点,提取存在的天数,然后根据需要检查值,并找到节点名称。示例(使用lxml.etree):

import re

from lxml import etree

pattern = re.compile(r"^(\d+) day\(s\)")

data = """your XML here"""
tree = etree.fromstring(data)

content = tree.findtext(".//String[@name='content']")
if content:
    match = pattern.search(content)
    if match:
        days = int(match.group(1))

        # TODO: check the days if needed

        node = tree.findtext(".//String[@name='node']")

        print node, days

打印:

NodeName.domain.net 12