Question

我有以下XML。

我正在使用ElementTree库来抓取值。

<?xml version="1.0" encoding="UTF-8"?>

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
 <url>    
  <loc> Test1</loc>
  </url>
 <url>
  <loc>Test 2</loc>
 </url>
 <url>
  <loc>Test 3</loc>
 </url>
</urlset>

我需要从'loc tag'中获取值。

期望的输出：

Test 1
Test 2
Test 3

Tried Code：

tree = ET.parse('sitemap.xml')
root = tree.getroot()
for atype in root.findall('url'):
 rank = atype.find('loc').text
print (rank)

有关我错在哪里的任何建议吗？

Answer 1

您的XML有一个默认命名空间（http://www.sitemaps.org/schemas/sitemap/0.9），因此您必须将所有代码都解决为：

tree = ET.parse('sitemap.xml')
root = tree.getroot()
for atype in root.findall('{http://www.sitemaps.org/schemas/sitemap/0.9}url'):
    rank = atype.find('{http://www.sitemaps.org/schemas/sitemap/0.9}loc').text
    print(rank)

或者定义命名空间映射：

nsmap = {"ns": "http://www.sitemaps.org/schemas/sitemap/0.9"}

tree = ET.parse('sitemap.xml')
root = tree.getroot()
for atype in root.findall('ns:url', nsmap):
    rank = atype.find('ns:loc', nsmap).text
    print(rank)

Answer 2

from lxml import etree


tree = etree.parse('sitemap.xml')
    for element in tree.iter('*'):
        if element.text.find('Test') != -1:
            print element.text

可能不是最美丽的解决方案，但它有效：）

从XML中的子节点获取值蟒

2 个答案: