在lxml FAQs中,它们提供以下内容:
如何将XML树映射到dicts的词典?
我很高兴你问:
def recursive_dict(element):
return element.tag, \
dict(map(recursive_dict, element)) or element.text
但是当我尝试使用它时,我得到以下内容:
>>> r = requests.get('http://localhost:8983/solr/admin/cores?action=STATUS')
>>> xml_dict = recursive_dict(lxml.etree.parse(StringIO.StringIO(r.content)))
AttributeError: 'lxml.etree._ElementTree' object has no attribute 'tag'
我是否缺少将ElementTree转换为元素的步骤?
答案 0 :(得分:3)
lxml.etree.parse
返回ElementTree
个对象,而不是Element
个对象。来自documentation:
ElementTree主要是一个带有根的树周围的文档包装器 节点。它提供了两种序列化方法和一般方法 文件处理。
ElementTree.getroot()
返回文档的根元素:
xml_doc = lxml.etree.parse(StringIO.StringIO(r.content))
xml_dict = recursive_dict(xml_doc.getroot())
修改强>
以下是recursive_dict
的变体,可能更适合:
def recursive_dict(element):
retval = {}
retval["tag"] = element.tag
if element.text:
retval["text"] = element.text
if element.tail:
retval["tail"] = element.tail
if element.attrib:
retval["attributes"] = element.attrib
if len(element) > 0:
retval["children"] = [recursive_dict(child_element) for child_element in element]
return retval
答案 1 :(得分:0)
我确实意识到我在这方面已经晚了大约 7.5 年,但次优实现仍然在常见问题解答中保持不变,我想在这里分享我的解决方案,因为在寻找有关此问题的答案时,它是一个突出的搜索结果,有人可能会最终发现它很有用。
对于我的用例,我想要一个介于 FAQ 中的内容和 codeape 提供的内容之间的版本。此版本允许仅通过标签访问子节点,但如果有多个具有相同标签的子节点,则会有一个字典列表,而不仅仅是最后一个值的字典。如果您需要更多的花里胡哨,也应该很容易适应。
这是我最终使用的:
def recursive_dict(element):
"""Takes an lxml element and returns a corresponding nested python dictionary.
If there's multiple child elements with same tag, it will have a list of them.
Improvement on https://lxml.de/FAQ.html#how-can-i-map-an-xml-tree-into-a-dict-of-dicts"""
# Trivial case returns only the element text.
if len(element) == 0:
return element.text
# Nested case returns a proper dictionary.
else:
retval = {}
for child in element:
# Recursive call computed, but not placed yet.
recurse = recursive_dict(child)
# No previous entry means it's now a single entry.
if child.tag not in (retval):
retval[child.tag] = recurse
# Previous single entry means it's now a list.
elif type(retval[child.tag]) is not list:
oldval = retval[child.tag]
retval[child.tag] = [oldval, recurse]
# Previous list entry means the list gets appended.
else:
oldlist = retval[child.tag]
retval[child.tag] = oldlist + [recurse]
return retval