Question

我正在尝试查询xml文档以打印出与较低级元素相关联的较高级元素属性。我得到的结果与xml结构不符。基本上，这是我到目前为止的代码。

d = {'vash': 1, 'the': 5, 'stampede': 12}
new_d = dict(sorted(d.items(), key=lambda x: x[1], reverse = True)) 
# {'stampede': 12, 'the': 5, 'vash': 1}

哪个产生这个-

import xml.etree.ElementTree as ET

tree = ET.parse('movies2.xml') root = tree.getroot()

for child in root:
    print(child.tag, child.attrib) print()

mov = root.findall("./genre/decade/movie/[year='2000']")
for movie in mov:
    print(child.attrib['category'], movie.attrib['title'])

如果检查xml-

，最后两行实际上应该列出与电影标题相关的两种不同流派属性

genre {'category': 'Action'}
genre {'category': 'Thriller'}
genre {'category': 'Comedy'}

Comedy X-Men
Comedy American Psycho

这是供参考的xml-

Action X-Men
Thriller American Psycho

Answer 1

您的初始循环：

for child in root:
    print(child.tag, child.attrib) print()

将child留给最后一个孩子；因此child.attrib['category']将永远是最后一个孩子的类别。就您而言，最后一个孩子是喜剧。对于第二个循环中的每部电影：

for movie in mov:
   print(child.attrib['category'], movie.attrib['title'])

您正在打印第一个循环中找到的最后一个孩子的类别；所以他们都打印“喜剧”。

编辑：这将至少选择具有正确流派标签的相同电影，但顺序可能不同：

for child in root:
    mov = child.findall("./decade/movie/[year='2000']")
    for movie in mov:
        print(child.attrib['category'], movie.attrib['title'])

另一种方法，使用lxml代替elementree：

from lxml import etree as ET

tree = ET.parse('movies2.xml')
root = tree.getroot()

mov = root.findall("./genre/decade/movie/[year='2000']")
for movie in mov:
    print(movie.getparent().getparent().attrib['category'], movie.attrib['title'])

我如何从此xml中正确提取信息

1 个答案: