遍历xml时查询子值

时间:2019-10-30 09:55:49

标签: python tree-traversal minidom

我正在解析xml,其中包含与以下内容类似的条目:

<ReportItem port="22" svc_name="ssh" protocol="tcp" severity="0" pluginID="22964" 
 pluginName="Service Detection" pluginFamily="Service detection">
<description>Nessus was able to identify the remote service by its banner or by looking at the error 
 message it sends when it receives an HTTP request.</description>
<fname>find_service.nasl</fname>
<plugin_modification_date>2019/08/14</plugin_modification_date>
<plugin_name>Service Detection</plugin_name>
<plugin_publication_date>2007/08/19</plugin_publication_date>
<plugin_type>remote</plugin_type>
<risk_factor>None</risk_factor>
<script_version>1.177</script_version>
<solution>n/a</solution>
<synopsis>The remote service could be identified.</synopsis>
<plugin_output>An SSH server is running on this port.</plugin_output>
</ReportItem>

我要查询plugin_name的文本值

hostIter = iter(hostsByIP)
for host in hostIter:
    reportIter = iter(host.elements.childNodes)
    for reportItem in reportIter:
            childIter = iter(reportItem.childNodes)
            for reportChild in childIter:
                print(reportChild.nodeValue)
                #if child.nodeValue == "Traceroute Information":

reportChild.nodeValue返回'None''/ n''None'...等。

reportChild.value引发错误'文本'对象没有属性'值'

reportChild.localName正确返回“ plugin_name”等,但也返回“ none”(认为代表文本节点?)

2 个答案:

答案 0 :(得分:0)

您可以使用xpath表达式"./ReportItem/plugin_name"

进行以下操作
import xml.etree.ElementTree as ET

data = '''<?xml version="1.0"?><data><ReportItem port="22" svc_name="ssh" protocol="tcp" severity="0" pluginID="22964"   pluginName="Service Detection" pluginFamily="Service detection"> <description>Nessus was able to identify the remote service by its banner or by looking at the error   message it sends when it receives an HTTP request.</description> <fname>find_service.nasl</fname> <plugin_modification_date>2019/08/14</plugin_modification_date> <plugin_name>Service Detection</plugin_name> <plugin_publication_date>2007/08/19</plugin_publication_date> <plugin_type>remote</plugin_type> <risk_factor>None</risk_factor> <script_version>1.177</script_version> <solution>n/a</solution> <synopsis>The remote service could be identified.</synopsis> <plugin_output>An SSH server is running on this port.</plugin_output> </ReportItem><ReportItem port="22" svc_name="ssh" protocol="tcp" severity="0" pluginID="22964"   pluginName="Service Detection" pluginFamily="Service detection"> <description>Nessus was able to identify the remote service by its banner or by looking at the error   message it sends when it receives an HTTP request.</description> <fname>find_service.nasl</fname> <plugin_modification_date>2019/08/14</plugin_modification_date> <plugin_name>Service Detection2</plugin_name> <plugin_publication_date>2007/08/19</plugin_publication_date> <plugin_type>remote</plugin_type> <risk_factor>None</risk_factor> <script_version>1.177</script_version> <solution>n/a</solution> <synopsis>The remote service could be identified.</synopsis> <plugin_output>An SSH server is running on this port.</plugin_output> </ReportItem></data>'''

root = ET.fromstring(data) 
for report in root.findall("./ReportItem/plugin_name"):
   print(report.text)
  

输出:服务检测服务检测2

答案 1 :(得分:0)

在尝试读取值之前需要检查节点类型:

hostIter = iter(hostsByIP)
for host in hostIter:
    reportIter = iter(host.elements.childNodes)
    for reportItem in reportIter:
            childIter = iter(reportItem.childNodes)
            for reportChild in childIter:
                if reportChild.nodeType == 1:
                    print(reportChild.firstChild.nodeValue)
                    #if child.nodeValue == "Traceroute Information":