使用PETL解析XML

时间:2018-04-09 00:11:58

标签: python xml parsing etl petl

我正在尝试使用PETL包解析Python中的以下XML代码

<Msg_file LeagueID="00" League="NBA" Season="2012-13" SeasonType="Regular Season">
<Game Number="0">
<Msg_Roster>
  <Player_info Person_id="2734" Team_id="1610612737" Player_status="A" First_name="Devin" Last_name="Harris" Jersey_number="34" Birth_date="February 27, 1983" Height="6'3&quot;" Weight="192" Position="G" School="Wisconsin" SchoolType="College" Country="USA" Display_affiliation="Wisconsin/USA" DraftYear="2004" FreeAgent="N" SeasonExp="8" PlayerCode="devin_harris"></Player_info>
  <Player_info Person_id="201143" Team_id="1610612737" Player_status="A" First_name="Al" Last_name="Horford" Jersey_number="15" Birth_date="June 03, 1986" Height="6'10&quot;" Weight="250" Position="C-F" School="Florida" SchoolType="College" Country="Dominican Republic" Display_affiliation="Florida/Dominican Republic" DraftYear="2007" FreeAgent="N" SeasonExp="5" PlayerCode="al_horford"></Player_info>
  <Player_info Person_id="203098" Team_id="1610612737" Player_status="A" First_name="John" Last_name="Jenkins" Jersey_number="12" Birth_date="March 06, 1991" Height="6'4&quot;" Weight="215" Position="G" School="Vanderbilt" SchoolType="College" Country="USA" Display_affiliation="Vanderbilt/USA" DraftYear="2012" FreeAgent="N" SeasonExp="0" PlayerCode="john_jenkins"></Player_info>
  <Player_info Person_id="201274" Team_id="1610612737" Player_status="A" First_name="Ivan" Last_name="Johnson" Jersey_number="44" Birth_date="April 10, 1984" Height="6'8&quot;" Weight="255" Position="F" School="Cal State San Bernardino" SchoolType="College" Country="USA" Display_affiliation="Cal State San Bernardino/USA" DraftYear="2011" FreeAgent="N" SeasonExp="1" PlayerCode="ivan_johnson"></Player_info>
  <Player_info Person_id="2563" Team_id="1610612737" Player_status="A" First_name="Dahntay" Last_name="Jones" Jersey_number="30" Birth_date="December 27, 1980" Height="6'6&quot;" Weight="225" Position="F" School="Duke" SchoolType="College" Country="USA" Display_affiliation="Duke/USA" DraftYear="2003" FreeAgent="N" SeasonExp="9" PlayerCode="dahntay_jones"></Player_info>
  <Player_info Person_id="2594" Team_id="1610612737" Player_status="A" First_name="Kyle" Last_name="Korver" Jersey_number="26" Birth_date="March 17, 1981" Height="6'7&quot;" Weight="212" Position="F-G" School="Creighton" SchoolType="College" Country="USA" Display_affiliation="Creighton/USA" DraftYear="2003" FreeAgent="N" SeasonExp="9" PlayerCode="kyle_korver"></Player_info>
 </Msg_Roster>
</Game>
</Msg_file>

我在PETL中使用以下代码:

import petl as etl
table2 = etl.fromxml('nba_rosters.xml','player_info','playercode')

我收到一条错误消息:

Traceback (most recent call last):
File "<pyshell#10>", line 1, in <module>
 table2
File "C:\Python\Python36-32\lib\idlelib\rpc.py", line 617, in displayhook
 text = repr(value)
File "C:\Python\Python36-32\lib\site-packages\petl\util\vis.py", line 135, 
 in _table_repr
return str(look(table))
File "C:\Python\Python36-32\lib\site-packages\petl\util\vis.py", line 122, 
 in __repr__
truncate=truncate, width=width)
File "C:\Python\Python36-32\lib\site-packages\petl\util\vis.py", line 197, in _look_grid
hdr = next(it)
StopIteration

关于如何正确解析此文件的任何想法都将是一个巨大的帮助。我是Python的新手,可以成功解析PETL文档提供的示例文件,但我无法将其转换为真实案例使用。

1 个答案:

答案 0 :(得分:2)

键中有一些拼写错误,您需要另一个参数:

代码:

import petl as etl
table2 = etl.fromxml('nba_rosters.xml', 'Msg_Roster', 'Player_info', 'PlayerCode')
print(table2)

结果:

+--------------+------------+--------------+--------------+---------------+-------------+
| devin_harris | al_horford | john_jenkins | ivan_johnson | dahntay_jones | kyle_korver |
+==============+============+==============+==============+===============+=============+