我想将以下代码与Keys Action和Thriller合并。仅会显示2个键{'Action':[电影列表],'Thriller':[电影列表]}。也欢迎使用新代码,例如lxml或BeautifulSoup。
import xml.etree.ElementTree as ET
from collections import defaultdict
tree = ET.parse('movies.xml')
root = tree.getroot()
d = {}
for child in root:
#print( child.attrib.values())
for movie in root.findall("./genre/decade/movie[@title]"):
#print(movie.attrib)
#print (list(movie.attrib.values())[1])
d[child.attrib.values()]=list(movie.attrib.values())[1]
d
{dict_values(['Action']): 'Indiana Jones: The raiders of the lost Ark',
dict_values(['Action']): 'THE KARATE KID',
dict_values(['Action']): 'Back 2 the Future',
dict_values(['Action']): 'X-Men',
dict_values(['Action']): 'Batman Returns',
dict_values(['Action']): 'Reservoir Dogs',
dict_values(['Action']): 'ALIEN',
dict_values(['Action']): "Ferris Bueller's Day Off",
dict_values(['Action']): 'American Psycho',
dict_values(['Thriller']): 'Indiana Jones: The raiders of the lost Ark',
dict_values(['Thriller']): 'THE KARATE KID',
dict_values(['Thriller']): 'Back 2 the Future',
dict_values(['Thriller']): 'X-Men',
dict_values(['Thriller']): 'Batman Returns',
dict_values(['Thriller']): 'Reservoir Dogs',
dict_values(['Thriller']): 'ALIEN',
dict_values(['Thriller']): "Ferris Bueller's Day Off",
dict_values(['Thriller']): 'American Psycho'}
我的xml来自datacamp。数据营地提供有关报废的信息 下面是xml,我已保存在本地文件夹中并命名为电影
<?xml version="1.0" encoding="UTF-8" ?>
<collection>
<genre category="Action">
<decade years="1980s">
<movie favorite="True" title="Indiana Jones: The raiders of the lost Ark">
<format multiple="No">DVD</format>
<year>1981</year>
<rating>PG</rating>
<description>
'Archaeologist and adventurer Indiana Jones
is hired by the U.S. government to find the Ark of the
Covenant before the Nazis.'
</description>
</movie>
<movie favorite="True" title="THE KARATE KID">
<format multiple="Yes">DVD,Online</format>
<year>1984</year>
<rating>PG</rating>
<description>None provided.</description>
</movie>
<movie favorite="False" title="Back 2 the Future">
<format multiple="False">Blu-ray</format>
<year>1985</year>
<rating>PG</rating>
<description>Marty McFly</description>
</movie>
</decade>
<decade years="1990s">
<movie favorite="False" title="X-Men">
<format multiple="Yes">dvd, digital</format>
<year>2000</year>
<rating>PG-13</rating>
<description>Two mutants come to a private academy for their kind whose resident superhero team must
oppose a terrorist organization with similar powers.</description>
</movie>
<movie favorite="True" title="Batman Returns">
<format multiple="No">VHS</format>
<year>1992</year>
<rating>PG13</rating>
<description>NA.</description>
</movie>
<movie favorite="False" title="Reservoir Dogs">
<format multiple="No">Online</format>
<year>1992</year>
<rating>R</rating>
<description>WhAtEvER I Want!!!?!</description>
</movie>
</decade>
</genre>
<genre category="Thriller">
<decade years="1970s">
<movie favorite="False" title="ALIEN">
<format multiple="Yes">DVD</format>
<year>1979</year>
<rating>R</rating>
<description>"""""""""</description>
</movie>
</decade>
<decade years="1980s">
<movie favorite="True" title="Ferris Bueller's Day Off">
<format multiple="No">DVD</format>
<year>1986</year>
<rating>PG13</rating>
<description>Funny movie about a funny guy</description>
</movie>
<movie favorite="FALSE" title="American Psycho">
<format multiple="No">blue-ray</format>
<year>2000</year>
<rating>Unrated</rating>
<description>psychopathic Bateman</description>
</movie>
</decade>
</genre>
</collection>
答案 0 :(得分:0)
您的代码可以很好地获取数据,这就是您解析数据的方式。在字典中,.values()
返回值的视图,您可以根据需要将其存储到列表中。在这种情况下,您需要字典本身的值,只需按键选择即可。 child.attrib['category']
。一旦知道了,您要做的就是更新字典。在这里,我们将使用defaultdict
,当第一次遇到该键时,它会返回一个空列表,以便我们可以添加电影标题。
import xml.etree.ElementTree as ET
from collections import defaultdict
tree = ET.parse('movies.xml')
root = tree.getroot()
d = defaultdict(list)
for child in root:
for movie in root.findall("./genre/decade/movie[@title]"):
d[child.attrib['category']].append(movie.attrib['title'])
>>d
defaultdict(list,
{'Action': ['Indiana Jones: The raiders of the lost Ark',
'THE KARATE KID',
'Back 2 the Future',
'X-Men',
'Batman Returns',
'Reservoir Dogs',
'ALIEN',
"Ferris Bueller's Day Off",
'American Psycho'],
'Thriller': ['Indiana Jones: The raiders of the lost Ark',
'THE KARATE KID',
'Back 2 the Future',
'X-Men',
'Batman Returns',
'Reservoir Dogs',
'ALIEN',
"Ferris Bueller's Day Off",
'American Psycho']})
如果您只想选择说“动作”,则可以像普通字典键一样选择。
d['Action']