python:gettng getElementsByTagName的多个结果

时间:2014-11-19 00:20:39

标签: python xml xml-parsing

我试图获取XML标记的每个实例,但我似乎只能返回一个或不返回。

#!/usr/software/bin/python

# import libraries
import urllib
from xml.dom.minidom import parseString

# variables
startdate    = "2014-01-01"
enddate      = "2014-05-01"
rest_client   = "test"
rest_host     = "restprd.test.com"
rest_port     = "80"
rest_base_url = "asup-rest-interface/ASUP_DATA"
rest_date     = "/start_date/%s/end_date/%s/limit/5000" % (startdate,enddate)
rest_api      = "http://" + rest_host + ":" + rest_port + "/" + rest_base_url + "/" + "client_id" + "/" + rest_client

response = urllib.urlopen(rest_api + rest_date + '/sys_serial_no/700000667725')

data = response.read()
response.close()
dom = parseString(data)
xmlVer = dom.getElementsByTagName('sys_version').toxml()
xmlDate = dom.getElementsByTagName('asup_gen_date').toxml()
xmlVerTag=xmlVer.replace('<sys_version>','').replace('</sys_version>','')
xmlDateTag=xmlDate.replace('<asup_gen_date>','').replace('</asup_gen_date>','').replace('T',' ')[0:-6]
print xmlDateTag ,  xmlVerTag

以上代码生成以下错误:

Traceback (most recent call last):
  File "./test.py", line 23, in <module>
    xmlVer = dom.getElementsByTagName('sys_version').toxml()
AttributeError: 'NodeList' object has no attribute 'toxml'

如果我将.toxml()更改为[0].toxml()我可以获得第一个元素,但我需要获取所有元素。有什么想法吗?

另外,如果我尝试这样的话,我根本就没有输出:

response = urllib.urlopen(rest_api + rest_date + '/sys_serial_no/700000667725')

DOMTree = xml.dom.minidom.parse(response)
collection = DOMTree.documentElement

if collection.hasAttribute("results"):
   print collection.getAttribute("sys_version")

原始数据如下所示。
XML的重复部分如下:

<xml><status request_id="58f39198-2c76-4e87-8e00-f7dd7e69519f1416354337206" response_time="00:00:00:833"></status><results start="1" limit="1000" total_results_count="1" results_count="1"><br/><system><tests start="1" limit="50" total_results_count="18" results_count="18"><test>    <biz_key>C|BF02F1A3-3C4E-11DC-8AAE-0015171BBD90|8594169899|700000667725</biz_key><test_id>2014071922090465</test_id><test_subject>HA Group Notification (WEEKLY_LOG) INFO</test_subject><test_type>DOT-REGULAR</test_type><asup_gen_date>2014-07-20T00:21:40-04:00</asup_gen_date><test_received_date>Sat Jul 19 22:09:19 PDT 2014</test_received_date><test_gen_zone>EDT</test_gen_zone><test_is_minimal>false</test_is_minimal><sys_version>9.2.2X22</sys_version><sys_operating_mode>Cluster-Mode</sys_operating_mode><hostname>rerfdsgt</hostname><sys_domain>test.com</sys_domain><cluster_name>bbrtp</cluster_name>  ... etc


<xml>
  <results>
    <system>
     -<sys_version>
      <asup> 
       -<asup_gen_date>


我只想提取sys_version和asup_gen_date

9.2.2X22   2014-07-20 00:21:40
9.2.2X21   2014-06-31 12:51:40 
8.5.2X1    2014-07-20 04:33:22

1 个答案:

答案 0 :(得分:0)

您需要循环getElementsByTagName()

的结果
for version in dom.getElementsByTagName('sys_version'):
    version = version.toxml()
    version = version.replace('<sys_version>','').replace('</sys_version>','')
    print version

此外,您可能希望使用getText()

,而不是替换开始和结束标记
def getText(nodelist):
    rc = []
    for node in nodelist:
        if node.nodeType == node.TEXT_NODE:
            rc.append(node.data)
    return ''.join(rc)

for version in dom.getElementsByTagName('sys_version'):
    print getText(version.childNodes)

另一点是,使用xml.etree.ElementTree解析xml会更容易和愉快,例如:

import xml.etree.ElementTree as ET

tree = ET.parse(response)
root = tree.getroot()

for version in root.findall('sys_version'):
    print version.text