Question

import xml.dom.minidom

water = """
<channel>
<item>
<title>water</title>
<link>http://www.water.com</link>
</item>
<item>
<title>fire</title>
<link>http://www.fire.com</link>
</item>
</channel>"""

dom=xml.dom.minidom.parseString(water)
linklist = dom.getElementsByTagName('link')
print (len(linklist))

使用minidom，我希望将链接和/ link之间的内容作为字符串。请让我知道如何。

Answer 1

如果你想坚持使用xml.dom.minidom，只需调用.firstChild.nodeValue即可。例如，您将链接存储在变量“linklist”中，因此要打印它们只需遍历它们并调用.firstChild.nodeValue，就像这样......

for link in linklist:
    print link.firstChild.nodeValue

...打印

http://www.water.com
http://www.fire.com

这里有更详细的答案.... Get Element value with minidom with Python

回答您的其他问题：
如果您想获得特定元素，您需要知道它在文档中的位置或搜索它。

例如，如果您知道您想要的链接是xml文档中的第二个链接，那么您将...

# the variable fire_link is a DOM Element of the second link in the xml file
fire_link = linklist[1]

但是，如果您想要链接但不知道文档中的位置，则必须搜索它。这是一个例子......

# fire_link is a list where each element is a DOM Element containing the http://www.fire.com link
fire_links = [l for l in linklist if l.firstChild.nodeValue == 'http://www.fire.com']

# take the first element
fire_link = fire_links[0]

Answer 2

这比它看起来更复杂。从文档中的示例中，将其附加到您问题中的代码：

def getText(nodelist):
    rc = []
    for node in nodelist:
        if node.nodeType == node.TEXT_NODE:
            rc.append(node.data)
    return ''.join(rc)

text = getText(linklist[0].childNodes)
print text

我建议尝试使用the elementtree module代码：

print linklist[0].text

如何在python中获取两个xml标记之间的内容？

2 个答案: