解析具有多个标签,属性和值的XML文件

时间:2018-07-27 04:00:28

标签: python python-3.x

我正在尝试解析具有以下架构的xml文件:

<game gameId="cricket">
    <Period duration="1year" endTime="2017-12-31"/>
    <repPeriod duration="1year"/>
    <player p="1">sachin</player>
    <player p="2">rahul</player>
    <player p="3">saurav</player>
    <player p="4">kapil</player>
    <player p="5">sanjay</player>
    <player p="6">kartik</player>
    <player p="7">michel</player>
    <player p="8">rickey</player>
    <ranking period="2016">
        <r p="1">3</r>
    </ranking>
    <ranking period="DEFAULT">
        <r p="2">4</r>
        <r p="3">16</r>
        <r p="4">16</r>
        <r p="5">6</r>
        <r p="6">3</r>
        <r p="7">7</r>
        <r p="8">7</r>
    </ranking>
</game>

我找不到属性p =“ 1”的玩家如何映射到相应的排名值

我想要的输出是:

玩家:排名

sachin:3

rahul:4

到目前为止,我的代码:

from xml.dom import minidom

doc = minidom.parse('report.xml')
node = doc.documentElement
gameinfo = doc.getElementsByTagName("game")

counterlist = ['cricket','football']
for gameid in gameinfo:
    for counter in counterlist:
        if gameid.getAttribute('game') == counter:
            itemlist = counter.getElementsByTagName("player")
            i = len(itemlist)
            j = 1
            while j<=i:
                for itemnumber in itemlist:
                    if itemnumber.getAttribute('p') == j:
                        Playername = gameid.getElementsByTagName("player")[j].childNodes[0].data
                        rankid = gameid.getElementsByTagName("r")[j].childNodes[0].data
                        print (playername : rankid)

                j = j+1

2 个答案:

答案 0 :(得分:0)

使用ElementTree

例如:

import xml.etree.ElementTree as ET
from collections import defaultdict

tree = ET.parse(filename)
root = tree.getroot()
d = defaultdict(list)

for tag in root.findall(".//*[@p]"):          #Find all tags with 'p' attrib
    d[tag.attrib['p']].append(tag.text)

for i in d.values():
    print("{} : {}".format(i[0], i[1]))

输出:

sachin : 3
saurav : 16
rahul : 4
sanjay : 6
kapil : 16
michel : 7
kartik : 3
rickey : 7

答案 1 :(得分:0)

最简单的方法是制作一个存储玩家姓名和ID的字典(即将<player p="1">sachin</player>存储为{ '1': 'sachin' },然后迭代排名并使用存储的玩家名称数据填充您的输出。

# collect player name and ID
pdic = {}
playerlist = doc.getElementsByTagName("player")
for item in playerlist:
    pdic[ item.getAttribute('p') ] = item.childNodes[0].data

# get all the rankings
for r in doc.getElementsByTagName('r'):
    # get attribute `p` and find it in our dictionary
    if r.getAttribute('p') in pdic:
        print( pdic[r.getAttribute('p')] + ": " + r.childNodes[0].data )

输出:

sachin: 3
rahul: 4
saurav: 16
kapil: 16
sanjay: 6
kartik: 3
michel: 7
rickey: 7