我尝试解析编码不良的XML并输出标记的节点名称和内容(仅当它存在时),并且仅当字符串名称=内容> 30天。
到目前为止,我可以使用ElementTree搜索子元素,但是我需要有关嵌套信息不佳的帮助。我无法更改XML,因为它是供应商提供的报告。我是一个完整的新手,所以请指导我做我需要做的事情或提供更好的帮助。提前谢谢。
示例文件:
<?xml version="1.0" encoding="UTF-8"?>
<ReportSection>
<ReportHead>
<Criteria>
<HeadStuff value=Dont Care>
</HeadStuff>
</Criteria>
</ReportHead>
<ReportBody>
<ReportSection name="UpTime" category="rule">
<ReportSection name="NodeName.domain.net" category="node">
<String name="node">NodeName.domain.net</String>
<String name="typeName">Windows Server</String>
<OID>-1y2p0ij32e8c8:-1y2p0idhghwg6</OID>
<ReportSection name="UpTime" category="element">
<ReportSection name="2015-09-20 18:50:10.0" category="version">
<String name="version">UpTime</String>
<OID>-1y2p0ij32e8cj:-1y2p0ibspofhp</OID>
<Integer name="changeType">2</Integer>
<String name="changeTypeName">Modified</String>
<Timestamp name="changeTime" displayvalue="9/20/15 6:50 PM">1442793010000</Timestamp>
<ReportSection name="versionContent" category="versionContent">
<String name="content">12 day(s), 7 hour(s), 33 minute(s), 8 second(s)</String>
<String name="content"></String>
</ReportSection>
</ReportSection>
</ReportSection>
</ReportSection>
</ReportSection>
</ReportBody>
</ReportSection>
答案 0 :(得分:2)
想法是找到content
节点,提取存在的天数,然后根据需要检查值,并找到节点名称。示例(使用lxml.etree
):
import re
from lxml import etree
pattern = re.compile(r"^(\d+) day\(s\)")
data = """your XML here"""
tree = etree.fromstring(data)
content = tree.findtext(".//String[@name='content']")
if content:
match = pattern.search(content)
if match:
days = int(match.group(1))
# TODO: check the days if needed
node = tree.findtext(".//String[@name='node']")
print node, days
打印:
NodeName.domain.net 12