我想从USEBIO xml文件中提取信息,但是数据丢失有问题。文件结构的相关部分是:
<PAIR>
<PAIR_NUMBER>1</PAIR_NUMBER>
…
<PLACE>12=</PLACE>
…
<PLAYER RATEABLE="Y">
<PLAYER_NAME>Douglas Adams</PLAYER_NAME>
…
<NATIONAL_ID_NUMBER>194576</NATIONAL_ID_NUMBER>
</PLAYER>
<PLAYER RATEABLE="Y">
<PLAYER_NAME>Arthur Dent</PLAYER_NAME>
…
<NATIONAL_ID_NUMBER>903493</NATIONAL_ID_NUMBER>
</PLAYER>
</PAIR>
任何给定位置都有任意数量的对子,并且每个对子中总是有两个玩家。我想为每个玩家创建三个元组的列表:(地点,玩家名称,national_id_number)。问题在于,national_id_number是可选的,而缺少时则没有标签。
我尝试过:
tree = ET.parse(EVENTS[event].filename)
results = []
for pair in tree.findall('.//PAIR'):
place = pair.find('.//PLACE').text
names = []
for name in pair.findall('.//PLAYER_NAME'):
names.append(name.text)
numbers = []
for num in pair.findall('.//NATIONAL_ID_NUMBER'):
numbers.append(num.text)
for name, ebunum in zip(names,numbers):
results.append((int(place.replace('=','')),name,int(ebunum)))
但是,这将忽略任何没有national_id_number的人。如果我使用zip_longest且fillvalue = 0,则可以获取所有名称,但不能保证将0 national_id_number分配给正确的人。
这是一个新手问题,因为那是我的身份。我是一个初学者,试图编写一个程序来帮助当地俱乐部的运作,而我在Python中进行xml解析的知识还不到36个小时。因此,您能提供的任何帮助将不胜感激。
这就是我现在正在做的事情,但是我更喜欢Pythonic:
def missing_ebu_number(place,name,results):
results.append((place,name,0))
print('Missing EBU number for: {name}.\nPlaced: {place}'
' in {event}\nEBU number for {name} set to 0\n'
.format(name=name,place=place,event=event))
try:
fh = open(EVENTS[event].filename, 'r', encoding=EBU_ENCODING)
results = []
ebunumexpected = False
for line in fh:
if '<PLACE>' in line:
if ebunumexpected:
missing_ebu_number(place,name,results)
ebunumexpected = False
place = int(line.replace('=','')
.strip()
.lstrip('<PLACE>')
.rstrip('</PLACE>'))
elif '<PLAYER_NAME>' in line:
if ebunumexpected:
missing_ebu_number(place,name,results)
namebits = line.split('>',1)
name = namebits[-1].split('<')[0]
ebunumexpected = True
elif '<NATIONAL_ID_NUMBER>' in line:
ebunum = int(line.strip()
.lstrip('<NATIONAL_ID_NUMBER>')
.rstrip('</NATIONAL_ID_NUMBER>'))
results.append((place,name,ebunum))
ebunumexpected = False