解析缺少数据的xml文件

时间:2018-09-13 10:26:45

标签: python xml

我想从USEBIO xml文件中提取信息,但是数据丢失有问题。文件结构的相关部分是:

<PAIR>
    <PAIR_NUMBER>1</PAIR_NUMBER>
…
    <PLACE>12=</PLACE>
…
    <PLAYER RATEABLE="Y">
           <PLAYER_NAME>Douglas Adams</PLAYER_NAME>
…
           <NATIONAL_ID_NUMBER>194576</NATIONAL_ID_NUMBER>
    </PLAYER>
    <PLAYER RATEABLE="Y">
           <PLAYER_NAME>Arthur Dent</PLAYER_NAME>
…
       <NATIONAL_ID_NUMBER>903493</NATIONAL_ID_NUMBER>
    </PLAYER>
</PAIR>

任何给定位置都有任意数量的对子,并且每个对子中总是有两个玩家。我想为每个玩家创建三个元组的列表:(地点,玩家名称,national_id_number)。问题在于,national_id_number是可选的,而缺少时则没有标签。

我尝试过:

tree = ET.parse(EVENTS[event].filename)
results = []
for pair in tree.findall('.//PAIR'):
    place = pair.find('.//PLACE').text
    names = []
    for name in pair.findall('.//PLAYER_NAME'):
        names.append(name.text)
    numbers = []
    for num in pair.findall('.//NATIONAL_ID_NUMBER'):
        numbers.append(num.text)
    for name, ebunum in zip(names,numbers):
        results.append((int(place.replace('=','')),name,int(ebunum)))

但是,这将忽略任何没有national_id_number的人。如果我使用zip_longest且fillvalue = 0,则可以获取所有名称,但不能保证将0 national_id_number分配给正确的人。

这是一个新手问题,因为那是我的身份。我是一个初学者,试图编写一个程序来帮助当地俱乐部的运作,而​​我在Python中进行xml解析的知识还不到36个小时。因此,您能提供的任何帮助将不胜感激。

这就是我现在正在做的事情,但是我更喜欢Pythonic:

def missing_ebu_number(place,name,results):
    results.append((place,name,0))
    print('Missing EBU number for: {name}.\nPlaced: {place}'
          ' in {event}\nEBU number for {name} set to 0\n'
          .format(name=name,place=place,event=event))

try:
    fh = open(EVENTS[event].filename, 'r', encoding=EBU_ENCODING)
    results = []
    ebunumexpected = False
    for line in fh:
        if '<PLACE>' in line:
            if ebunumexpected:
                missing_ebu_number(place,name,results)                    
            ebunumexpected = False
            place = int(line.replace('=','')
                        .strip()
                        .lstrip('<PLACE>')
                        .rstrip('</PLACE>'))
        elif '<PLAYER_NAME>' in line:
            if ebunumexpected:
                missing_ebu_number(place,name,results)                    

            namebits = line.split('>',1)
            name = namebits[-1].split('<')[0]
            ebunumexpected = True

        elif '<NATIONAL_ID_NUMBER>' in line:
            ebunum = int(line.strip()
                        .lstrip('<NATIONAL_ID_NUMBER>')
                        .rstrip('</NATIONAL_ID_NUMBER>'))                

            results.append((place,name,ebunum))
            ebunumexpected = False

0 个答案:

没有答案