import xml.dom.minidom
content = """
<urlset xmlns="http://www.google.com/schemas/sitemap/0.90">
<url>
<loc>http://www.domain.com/</loc>
<lastmod>2011-01-27T23:55:42+01:00</lastmod>
<changefreq>daily</changefreq>
<priority>0.5</priority>
</url>
<url>
<loc>http://www.domain.com/page1.html</loc>
<lastmod>2011-01-26T17:24:27+01:00</lastmod>
<changefreq>daily</changefreq>
<priority>0.5</priority>
</url>
<url>
<loc>http://www.domain.com/page2.html</loc>
<lastmod>2011-01-26T15:35:07+01:00</lastmod>
<changefreq>daily</changefreq>
<priority>0.5</priority>
</url>
</urlset>
"""
xml = xml.dom.minidom.parseString(content)
urlset = xml.getElementsByTagName("urlset")[0]
url = urlset.getElementsByTagName("url")
for i in range(0, url.length):
loc = url[i].getElementsByTagName("loc")[0].childNodes[0].nodeValue
lastmod = url[i].getElementsByTagName("lastmod")[0].childNodes[0].nodeValue
changefreq = url[i].getElementsByTagName("changefreq")[0].childNodes[0].nodeValue
priority = url[i].getElementsByTagName("priority")[0].childNodes[0].nodeValue
print "%s, %s, %s, %s" % (loc, lastmod, changefreq, priority)
是否有更简单的方法来获取节点的值?
loc = url[i].getElementsByTagName("loc")[0].childNodes[0].nodeValue
答案 0 :(得分:0)
这是否有效:loc = getElementsByTagName("loc")[i].innerHTML
?
答案 1 :(得分:0)
可能有更好的方法来获得节点的价值......但这至少是一个更清洁的选择,你不会重复:
import xml.dom.minidom
content = """
<urlset xmlns="http://www.google.com/schemas/sitemap/0.90">
<url>
<loc>http://www.domain.com/</loc>
<lastmod>2011-01-27T23:55:42+01:00</lastmod>
<changefreq>daily</changefreq>
<priority>0.5</priority>
</url>
<url>
<loc>http://www.domain.com/page1.html</loc>
<lastmod>2011-01-26T17:24:27+01:00</lastmod>
<changefreq>daily</changefreq>
<priority>0.5</priority>
</url>
<url>
<loc>http://www.domain.com/page2.html</loc>
<lastmod>2011-01-26T15:35:07+01:00</lastmod>
<changefreq>daily</changefreq>
<priority>0.5</priority>
</url>
</urlset>
"""
def get_first_node_val(obj, tag):
return obj.getElementsByTagName(tag)[0].childNodes[0].nodeValue
xml = xml.dom.minidom.parseString(content)
urlset = xml.getElementsByTagName("urlset")[0]
urls = urlset.getElementsByTagName("url")
for url in urls:
loc = get_first_node_val(url, "loc")
lastmod = get_first_node_val(url, "lastmod")
changefreq = get_first_node_val(url, "changefreq")
priority = get_first_node_val(url, "priority")
print "%s, %s, %s, %s" % (loc, lastmod, changefreq, priority)
答案 2 :(得分:0)
为什么不是第一个节点
loc = url[i].getElementsByTagName("loc").firstChild.nodeValue
答案 3 :(得分:0)
为“get_first_node_val”添加附加功能,该功能接受具有相同节点值的XML元素。例如,以下包含两个loc元素。
<url>
<loc>http://domain.com/</loc>
<loc>http://sub.domain.com</loc>
<lastmod>2011-01-27T23:55:42+01:00</lastmod>
<changefreq>daily</changefreq>
<priority>0.5</priority>
</url>
def get_first_node_val(obj, tag):
element = []
l = 0
for x in obj.getElementsByTagName(tag):
element.append({tag : obj.getElementsByTagName(tag)[l].childNodes[0].nodeValue})
l += 1
return element
输出
[{'loc': u'http://domain.com/'}, {'loc': u'http://sub.domain.com'}], [{'lastmod': u'2011-01-27T23:55:42+01:00'}], [{'changefreq': u'daily'}], [{'priority': u'0.5'}]