我试图获取XML标记的每个实例,但我似乎只能返回一个或不返回。
#!/usr/software/bin/python
# import libraries
import urllib
from xml.dom.minidom import parseString
# variables
startdate = "2014-01-01"
enddate = "2014-05-01"
rest_client = "test"
rest_host = "restprd.test.com"
rest_port = "80"
rest_base_url = "asup-rest-interface/ASUP_DATA"
rest_date = "/start_date/%s/end_date/%s/limit/5000" % (startdate,enddate)
rest_api = "http://" + rest_host + ":" + rest_port + "/" + rest_base_url + "/" + "client_id" + "/" + rest_client
response = urllib.urlopen(rest_api + rest_date + '/sys_serial_no/700000667725')
data = response.read()
response.close()
dom = parseString(data)
xmlVer = dom.getElementsByTagName('sys_version').toxml()
xmlDate = dom.getElementsByTagName('asup_gen_date').toxml()
xmlVerTag=xmlVer.replace('<sys_version>','').replace('</sys_version>','')
xmlDateTag=xmlDate.replace('<asup_gen_date>','').replace('</asup_gen_date>','').replace('T',' ')[0:-6]
print xmlDateTag , xmlVerTag
以上代码生成以下错误:
Traceback (most recent call last):
File "./test.py", line 23, in <module>
xmlVer = dom.getElementsByTagName('sys_version').toxml()
AttributeError: 'NodeList' object has no attribute 'toxml'
如果我将.toxml()
更改为[0].toxml()
我可以获得第一个元素,但我需要获取所有元素。有什么想法吗?
另外,如果我尝试这样的话,我根本就没有输出:
response = urllib.urlopen(rest_api + rest_date + '/sys_serial_no/700000667725')
DOMTree = xml.dom.minidom.parse(response)
collection = DOMTree.documentElement
if collection.hasAttribute("results"):
print collection.getAttribute("sys_version")
原始数据如下所示。
XML的重复部分如下:
<xml><status request_id="58f39198-2c76-4e87-8e00-f7dd7e69519f1416354337206" response_time="00:00:00:833"></status><results start="1" limit="1000" total_results_count="1" results_count="1"><br/><system><tests start="1" limit="50" total_results_count="18" results_count="18"><test> <biz_key>C|BF02F1A3-3C4E-11DC-8AAE-0015171BBD90|8594169899|700000667725</biz_key><test_id>2014071922090465</test_id><test_subject>HA Group Notification (WEEKLY_LOG) INFO</test_subject><test_type>DOT-REGULAR</test_type><asup_gen_date>2014-07-20T00:21:40-04:00</asup_gen_date><test_received_date>Sat Jul 19 22:09:19 PDT 2014</test_received_date><test_gen_zone>EDT</test_gen_zone><test_is_minimal>false</test_is_minimal><sys_version>9.2.2X22</sys_version><sys_operating_mode>Cluster-Mode</sys_operating_mode><hostname>rerfdsgt</hostname><sys_domain>test.com</sys_domain><cluster_name>bbrtp</cluster_name> ... etc
<xml>
<results>
<system>
-<sys_version>
<asup>
-<asup_gen_date>
我只想提取sys_version和asup_gen_date
9.2.2X22 2014-07-20 00:21:40
9.2.2X21 2014-06-31 12:51:40
8.5.2X1 2014-07-20 04:33:22
答案 0 :(得分:0)
您需要循环getElementsByTagName()
:
for version in dom.getElementsByTagName('sys_version'):
version = version.toxml()
version = version.replace('<sys_version>','').replace('</sys_version>','')
print version
此外,您可能希望使用getText()
:
def getText(nodelist):
rc = []
for node in nodelist:
if node.nodeType == node.TEXT_NODE:
rc.append(node.data)
return ''.join(rc)
for version in dom.getElementsByTagName('sys_version'):
print getText(version.childNodes)
另一点是,使用xml.etree.ElementTree
解析xml
会更容易和愉快,例如:
import xml.etree.ElementTree as ET
tree = ET.parse(response)
root = tree.getroot()
for version in root.findall('sys_version'):
print version.text