我正在尝试解析我的itunes播放列表,它是xml格式。
以下是我尝试解析的示例xml,并将最终结果放在pandas数据框中。
<dict>
<key>Track ID</key><integer>3636</integer>
<key>Size</key><integer>6661871</integer>
<key>Total Time</key><integer>211774</integer>
<key>Track Number</key><integer>4</integer>
<key>Track Count</key><integer>14</integer>
<key>Year</key><integer>2007</integer>
<key>Date Modified</key><date>2008-06-27T15:14:16Z</date>
<key>Date Added</key><date>2009-07-06T12:03:10Z</date>
<key>Bit Rate</key><integer>251</integer>
<key>Sample Rate</key><integer>44100</integer>
<key>Play Count</key><integer>5</integer>
<key>Play Date</key><integer>3373708724</integer>
<key>Play Date UTC</key><date>2010-11-27T13:18:44Z</date>
<key>Skip Count</key><integer>3</integer>
<key>Skip Date</key><date>2015-06-26T14:20:01Z</date>
<key>Persistent ID</key><string>E966DF081B4B40E1</string>
<key>Track Type</key><string>File</string>
<key>File Folder Count</key><integer>5</integer>
<key>Library Folder Count</key><integer>1</integer>
<key>Artist</key><string>Fall Out Boy</string>
<key>Album</key><string>Infinity On High</string>
<key>Genre</key><string>Rock</string>
<key>Kind</key><string>MPEG audio file</string>
</dict>
以下是我解析xml的python代码
from lxml import objectify
import pandas as pd
path = 'C:/Users/username/desktop/itunes.xml'
xml = objectify.parse(open(path))
root = xml.getroot()
tracks = root.getchildren()[0].getchildren()[15]
oddelements=tracks.getchildren()[1::2]
最终结果“oddelements”对象是元素词典列表
此列表中的每个元素字典都包含我在上面粘贴的示例xml中的“dict”标记中包含的信息。
如何解析这个元素词典列表并将它们解压缩到pandas数据框中以供进一步分析?
非常感谢您的帮助
答案 0 :(得分:0)
这样的事情应该有效:
import xml.etree.ElementTree as ET
import pandas as pd
root=ET.fromstring('<dict><key>Track ID</key><integer>3636</integer></dict>')
#parsing into a dictionary
d={}
k=''
for t in root:
if t.tag=='key':
k=t.text
continue
d[k]=t.text
#transforming to a DataFrame
df=pd.DataFrame(d.items(),columns=['key','value'])
print (df)