Question

我正在使用python并尝试获取一些XML并将其转换为dict。代码工作正常，除了一些奇怪的文本被添加到元素标签，然后被添加到dict属性名称。这个文本似乎是“WebServiceGeocodeQueryResult”属性的值：“xmlns”。

我的代码如下所示：

import xml.etree.ElementTree as ET
import xml_to_dictionary # This is some code I found, it seems to work fine:
                         # http://code.activestate.com/recipes/410469-xml-as-dictionary/

def doSomeStuff()
    theXML = """
<?xml version="1.0" encoding="utf-8"?>
    <WebServiceGeocodeQueryResult 
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
         xmlns="https://webgis.usc.edu/">

        <TransactionId>7307e84c-d0c8-4aa8-9b83-8ab4515db9cb</TransactionId>
        <Latitude>38.8092475915888</Latitude>
        <Longitude>-77.2378689948621</Longitude>
        ...
"""

    tree = ET.XML(result.content)   # this is where the element names get the added '{https://webgis.usc.edu/}'
    xmldict = xml_to_dictionary.XmlDictConfig(tree)

正如您在调试器中看到的，对象“树”中的元素名称具有恼人的前缀：“{https://webgis.usc.edu/}”： enter image description here

此前缀将转换为dict属性名称： enter image description here

Answer 1

“奇怪的文本”是元素的命名空间。 ElementTree expands element names to universal names。

您可以预处理元素名称，如下所示：

tree = ET.XML(thexml)
et = ET.ElementTree(tree) # this is to include root node
for elem in et.getiterator(): #in python 2.7 or greater, getiterator() is unnecessary
    elem.tag = elem.tag.split('}', 1)[-1]

顺便说一句，如果cElementTree可用，您应该使用它，因为它会更快。（import xml.etree.cElementTree as ET）

阻止xml.etree.ElementTree.xml（）在Element标记中包含网站名称

1 个答案: