使用mindom将XML Dom解析为Python列表

时间:2014-02-10 22:13:04

标签: python xml

我有一个从网站上获取的XML文件。我已将XML放在DOM中,除了我有以下内容之外,我能够从中获取大部分所需信息:

<response>
  <result name="response" numFound="2567888" start="0">
    <doc>
      <int name="ImageCount">3</int>
      <arr name="Images">
        <str>binder/jnws/jnws40/images/p1120.jpg</str>
        <str>binder/jnws/jnws40/images/g0753.jpg</str>
        <str>binder/jnws/jnws40/images/p0754.jpg</str>
      </arr>
    </doc>
  </result>
</response>

我的代码是:

for node in solardom.getElementsByTagName('doc'):
  # Get the Image Count & Video Counts for this doc element ..."
  imageCount = int(getMyElementValue(node, "int", "ImageCount"))
  videoCount = int(getMyElementValue(node, "int", "VideoCount"))
  if imageCount > 0:
    print "Image Count is: " + str(imageCount)
    imageList = getMyList(node, "arr", "Images", imageCount)

def getMyList(n, ntype, s, num):
  list = []
  i = 0
  for node in n.getElementsByTagName(ntype):
    if node.getAttribute("name") == s:
      print "Found Image Path!!!"

我看到我处于XML中的正确级别,但我无法弄清楚如何将图像路径的字符串值填充到Python列表中。

感谢你给我的任何帮助或指示。 杰克

3 个答案:

答案 0 :(得分:0)

试试xmltodict模块。

>>> import xmltodict
>>> obj = xmltodict.parse(xml)
>>> print(obj['response']['result']['doc']['arr']['str'])
>>> ['binder/jnws/jnws40/images/p1120.jpg', 'binder/jnws/jnws40/images/g0753.jpg', 'binder/jnws/jnws40/images/p0754.jpg']

答案 1 :(得分:0)

尝试return [child.nodeValue for child in node.childNodes]

答案 2 :(得分:0)

好的,试试这个

xml = '''
  <response>
  <result name="response" numFound="2567888" start="0">
    <doc>
      <int name="ImageCount">3</int>
      <arr name="Images">
        <str>binder/jnws/jnws40/images/p1120.jpg</str>
        <str>binder/jnws/jnws40/images/g0753.jpg</str>
        <str>binder/jnws/jnws40/images/p0754.jpg</str>
      </arr>
    </doc>
  </result>
 </response>
'''

>>> import xml.etree.ElementTree as ET
>>> root = ET.fromstring(xml)    
>>> imgs = [img.text for img in root.findall(".//*[@name='Images']/str")]
>>> ['binder/jnws/jnws40/images/p1120.jpg', 'binder/jnws/jnws40/images/g0753.jpg', 'binder/jnws/jnws40/images/p0754.jpg']

您可以阅读更多here