使用Python从XML中提取信息,以列表形式输出

时间:2015-11-13 22:34:07

标签: python xml parsing

我尝试从此XML文档中提取数据并将输出设为列表:

前:

['10-Yard Fight (USA, Europe)', '1942 (Japan, USA)', .......]

我只能弄清楚如何制作许多独立的名单。

前:

['10-Yard Fight (USA, Europe)']
['1942 (Japan, USA)']
[.......]

XML示例:

<?xml version="1.0"?>
<menu>
<header>
    <listname>Nintendo Entertainment System</listname>
    <id>003</id>
    <lastlistupdate>10/16/2014</lastlistupdate>
    <listversion>1.1 Final</listversion>
    <manufacturer>Nintendo</manufacturer>
    <media>
        <artwork></artwork>
        <video></video>
    </media>
    <exporterversion>HyperList XML Exporter Version 1.3 Copywrite (c) 2009-2011 William Strong</exporterversion>
</header>
<game name="10-Yard Fight (USA, Europe)" index="true" image="1" id="0034232">
    <description>10-Yard Fight (USA, Europe)</description>
    <cloneof></cloneof>
    <crc>3D564757</crc>
    <manufacturer>Nintendo</manufacturer>
    <year>1985</year>
    <genre>Football/Sports</genre>
    <rating>HSRS - GA (General Audience)</rating>
    <enabled>Yes</enabled>
</game>
<game name="1942 (Japan, USA)" index="" image="">
    <description>1942 (Japan, USA)</description>
    <cloneof></cloneof>
    <crc>171251E3</crc>
    <manufacturer>Capcom</manufacturer>
    <year>1986</year>
    <genre>Shoot-&apos;Em-Up</genre>
    <rating>HSRS - GA (General Audience)</rating>
    <enabled>Yes</enabled>
</game>
<game name="1943 - The Battle of Midway (USA)" index="" image="">
    <description>1943 - The Battle of Midway (USA)</description>
    <cloneof></cloneof>
    <crc>12C6D5C7</crc>
    <manufacturer>Capcom</manufacturer>
    <year>1988</year>
    <genre>Shoot-&apos;Em-Up</genre>
    <rating>HSRS - GA (General Audience)</rating>
    <enabled>Yes</enabled>
</game>
</menu>

我的示例Python代码

from xml.dom import minidom

def databaseGameExtraction(xml):
    xmldoc = minidom.parse(xml)
    games = xmldoc.getElementsByTagName('game')
    for game in games:
        romKey = game.attributes['name']
        roms = [romKey.value]
        print(roms)
    return roms

databaseGameExtraction('Nintendo Entertainment System.xml')

另外,我希望获得“任天堂娱乐系统”的价值。也被退回。

在一个完美的世界中,当从另一个函数调用时,该函数将以列表形式返回roms并以列表形式返回系统名称。

谢谢,

  • 非常初级的编码员

2 个答案:

答案 0 :(得分:0)

我认为你需要

roms = []

for game in games:
    romKey = game.attributes['name']
    roms.append(romKey.value)

print("all roms:", roms)

答案 1 :(得分:0)

您需要从XML迭代地构建roms列表:

roms = []
for game in games:
    rom_key = game.attributes['name']
    roms.append(rom_key.value)

或更好地写为list-comprehension

roms = [game.attributes['name'].value for game in games]

您还可以使用以下方式提取“任天堂娱乐系统”

xmldoc.getElementsByTagName('listname')[0].firstChild.data

离开了我们:

from xml.dom import minidom

def databaseGameExtraction(xml):
    xmldoc = minidom.parse(xml)
    roms = [game.attributes['name'].value
            for game in xmldoc.getElementsByTagName('game')]
    compagny = xmldoc.getElementsByTagName('listname')[0].childNodes[0].data
    return roms, compagny

roms, compagny = databaseGameExtraction('Nintendo Entertainment System.xml')
print(compagny)
print(roms)