How to remove attribute from root element in Python xml etree ElementTree

时间:2016-12-02 04:52:39

标签: xml python-2.7 xml-parsing elementtree xml-sitemap

My file contains the following data:

Original:

<?xml version="1.0" encoding="UTF-8"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <changefreq>daily</changefreq> <loc>http://www.example.com</loc></url></urlset>

Expected:

<?xml version="1.0" encoding="UTF-8"?><urlset> <url> <changefreq>daily</changefreq> <loc>http://www.example.com</loc></url></urlset>

I use etree to parse the file and I want to remove the attribute from the root element 'urlset'

import xml.etree.ElementTree as ET

tree = ET.parse("/Users/hsyang/Downloads/VI-0-11-14-2016_20.xml")
root = tree.getroot()

print root.attrib
>> {}

root.attrib.pop("xmlns", None)

print root.attrib
>> {}
ET.tostring(root)

I thought I was supposed to get {xmlns:"http://www.sitemaps.org/schemas/sitemap/0.9"} when i print root.attrib the first time but I got an empty dictionary. Can someone help?

Appreciate it!

2 个答案:

答案 0 :(得分:1)

xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"看起来像一个常规属性,但它是一种特殊情况,即名称空间声明。

删除,添加或修改名称空间可能非常困难。 &#34;正常&#34;属性存储在元素的可写attrib属性中。另一方面,命名空间映射不是通过API提供的(在lxml库中,元素确实具有nsmap属性,但它是只读的。)

我建议使用简单的文本搜索和替换操作,类似于Modify namespaces in a given xml document with lxml的答案。像这样:

with open("input.xml", "r") as infile, open("output.xml", "w") as outfile:
    data = infile.read()
    data = data.replace(' xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"', '')
    outfile.write(data)

另见How to insert namespace and prefixes into an XML string with Python?

答案 1 :(得分:0)

在标准库xml.etree.ElementTree中,没有删除属性的特殊方法,但所有属性都存储在attrib dict中,并且可以从{{attrib删除任何属性1}}来自dict

的密钥
    import xml.etree.ElementTree as ET

    tree = ET.parse(file_path)
    root = tree.getroot()      

    print(root.attrib)  # {'xyz': '123'}

    root.attrib.pop("xyz", None)  # None is to not raise an exception if xyz does not exist

    print(root.attrib)  # {}

    ET.tostring(root)
    '<urlset> <url> <changefreq>daily</changefreq> <loc>http://www.example.com</loc></url></urlset>'