我的XML名称空间为:
<metadata xmlns="http://example.com">
<samples>
<sample>
<hashes>
<hash type="md5">Abc6FC6F4AA4C5315D2A52E29865F7F6</hash>
</hashes>
<detections>
<detection vendor="example_1" date="2015-02-17T01:55:38" type="human" >
<![CDATA[my_detection1]]>
</detection>
<detection vendor="example_2" date="2015-02-17T01:55:38" type="computer" >
<![CDATA[my_detection2]]>
</detection>
</detections>
</sample>
<sample>
<hashes>
<hash type="md5">CDEFC6F4AA4C5315D2A52E29865F7F6</hash>
</hashes>
<detections>
<detection vendor="example_3" date="2015-02-17T01:55:38" type="human" >
<![CDATA[my_detection3]]>
</detection>
<detection vendor="example_4" date="2015-02-17T01:55:38" type="computer" >
<![CDATA[my_detection4]]>
</detection>
</detections>
</sample>
</samples>
</metadata>
我想提取数据:
如果特定的“md5”匹配,则检查“detection”中的“vendor”属性,如果匹配,则提取属性“date”和文本值(例如:“my_detection1”)
文件非常大,包含大量“sample”标签。感谢。
答案 0 :(得分:0)
全部谢谢!最后我发现了如何实现这一目标。 python中的DOM最适合进行困难的XML操作,这需要大量的if / else操作:
import xml.dom.minidom
from xml.dom.minidom import Node
dom = xml.dom.minidom.parse("C:/tmp/merged.xml")
hash_node=dom.getElementsByTagName('hash')
md5='7CD6FC6F4AA4C5315D2A52E29865F7F6'
for node1 in hash_node:
str1 = node1.childNodes[0].wholeText
if (str1 == md5):
hashes_node = node1.parentNode
sample_node = hashes_node.parentNode
detection_node = sample_node.getElementsByTagName('detection')
print ("For MD5 " + md5 + ",\n\n")
for node2 in detection_node:
print (node2.childNodes[0].wholeText)