我有一个从网站上获取的XML文件。我已将XML放在DOM中,除了我有以下内容之外,我能够从中获取大部分所需信息:
<response>
<result name="response" numFound="2567888" start="0">
<doc>
<int name="ImageCount">3</int>
<arr name="Images">
<str>binder/jnws/jnws40/images/p1120.jpg</str>
<str>binder/jnws/jnws40/images/g0753.jpg</str>
<str>binder/jnws/jnws40/images/p0754.jpg</str>
</arr>
</doc>
</result>
</response>
我的代码是:
for node in solardom.getElementsByTagName('doc'):
# Get the Image Count & Video Counts for this doc element ..."
imageCount = int(getMyElementValue(node, "int", "ImageCount"))
videoCount = int(getMyElementValue(node, "int", "VideoCount"))
if imageCount > 0:
print "Image Count is: " + str(imageCount)
imageList = getMyList(node, "arr", "Images", imageCount)
def getMyList(n, ntype, s, num):
list = []
i = 0
for node in n.getElementsByTagName(ntype):
if node.getAttribute("name") == s:
print "Found Image Path!!!"
我看到我处于XML中的正确级别,但我无法弄清楚如何将图像路径的字符串值填充到Python列表中。
感谢你给我的任何帮助或指示。 杰克
答案 0 :(得分:0)
试试xmltodict模块。
>>> import xmltodict
>>> obj = xmltodict.parse(xml)
>>> print(obj['response']['result']['doc']['arr']['str'])
>>> ['binder/jnws/jnws40/images/p1120.jpg', 'binder/jnws/jnws40/images/g0753.jpg', 'binder/jnws/jnws40/images/p0754.jpg']
答案 1 :(得分:0)
尝试return [child.nodeValue for child in node.childNodes]
。
答案 2 :(得分:0)
好的,试试这个
xml = '''
<response>
<result name="response" numFound="2567888" start="0">
<doc>
<int name="ImageCount">3</int>
<arr name="Images">
<str>binder/jnws/jnws40/images/p1120.jpg</str>
<str>binder/jnws/jnws40/images/g0753.jpg</str>
<str>binder/jnws/jnws40/images/p0754.jpg</str>
</arr>
</doc>
</result>
</response>
'''
>>> import xml.etree.ElementTree as ET
>>> root = ET.fromstring(xml)
>>> imgs = [img.text for img in root.findall(".//*[@name='Images']/str")]
>>> ['binder/jnws/jnws40/images/p1120.jpg', 'binder/jnws/jnws40/images/g0753.jpg', 'binder/jnws/jnws40/images/p0754.jpg']
您可以阅读更多here