从Python中的XML文件解析歌曲名称等

时间:2015-10-05 10:43:06

标签: python xml parsing

由于害怕遭到抨击,有一段时间没有问过SO,但我真的被卡住了。

是的,我已经查看了从XML文件解析的其他答案。通过我的无能和缺乏经验的某种组合,我似乎无法从描述我制作的播放列表的XML文件中解析信息。

我没有解析XML文件的经验,但我仍然没有看到我做错了什么。作为这方面的新手,我希望能够教授而不仅仅是推荐这样的图书馆的答案,因为这是无法学习的。

这是XML文件的顶部和我正在解析的XML文件中的第一首歌。对不起,我没有这个的通用版本,因为我怀疑我的问题可能来自这个文件的设计,但我很高兴出错。

?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Major Version</key><integer>1</integer>
    <key>Minor Version</key><integer>1</integer>
    <key>Date</key><date>2015-10-05T09:48:26Z</date>
    <key>Application Version</key><string>12.2.2.25</string>
    <key>Features</key><integer>5</integer>
    <key>Show Content Ratings</key><true/>
    <key>Library Persistent ID</key><string>4EFDC340CA35F4A8</string>
    <key>Tracks</key>
    <dict>
        <key>1467</key>
        <dict>
            <key>Track ID</key><integer>1467</integer>
            <key>Name</key><string>Gorgeous (feat. Kid Cudi &#38; Raekwon)</string>
            <key>Artist</key><string>Kanye West</string>
            <key>Album Artist</key><string>Kanye West</string>
            <key>Composer</key><string>Che Smith/Mike Dean/Corey Woods/Ernest Wilson/Roger McGuinn/Scott Mescudi/Gene Clark/Malik Jones</string>
            <key>Album</key><string>My Beautiful Dark Twisted Fantasy</string>
            <key>Genre</key><string>Hip Hop</string>
            <key>Year</key><integer>2010</integer>
        </dict>

(我试图删除尽可能多的无关文件信息。

我的直接目标是能够打印出一些简单格式化的信息:

Song Name: "Gorgeous" | Artist: "Kanye West | Album: "MBDTF"

我尝试了很多东西,主要是ElementTree。这是我尝试过的代码之一:

docroot = ElementTree.parse('MyPlaylist.xml').getroot()

for child in docroot:
    for dict in child:
        for a in dict:
            print a.tag, a.attrib

那只是打印出令人困惑的

key {}
dict {}
key {}
dict {}
key {}
#and so on...

我尝试了进一步的嵌套,但它返回的相同或类似。

提醒:我已经看过“如何使用Python解析XML”的其他答案,但我真的不明白。我相信我现在已经耗尽了手头的资源,需要有人给我一些指示。非常感谢![/ p>

1 个答案:

答案 0 :(得分:1)

以下是从dict标签获取歌曲详细信息的简单代码。在代码中添加注释。

<强>输入

<plist version="1.0">
<dict>
    <key>Major Version</key><integer>1</integer>
    <key>Minor Version</key><integer>1</integer>
    <key>Date</key><date>2015-10-05T09:48:26Z</date>
    <key>Application Version</key><string>12.2.2.25</string>
    <key>Features</key><integer>5</integer>
    <key>Show Content Ratings</key><true/>
    <key>Library Persistent ID</key><string>4EFDC340CA35F4A8</string>
    <key>Tracks</key>
    <dict>
        <key>1467</key>
        <dict>
            <key>Track ID</key><integer>1467</integer>
            <key>Name</key><string>Gorgeous (feat. Kid Cudi &#38; Raekwon)</string>
            <key>Artist</key><string>Kanye West</string>
            <key>Album Artist</key><string>Kanye West</string>
            <key>Composer</key><string>Che Smith/Mike Dean/Corey Woods/Ernest Wilson/Roger McGuinn/Scott Mescudi/Gene Clark/Malik Jones</string>
            <key>Album</key><string>My Beautiful Dark Twisted Fantasy</string>
            <key>Genre</key><string>Hip Hop</string>
            <key>Year</key><integer>2010</integer>
        </dict>
    </dict>

    <dict>
        <key>1468</key>
        <dict>
            <key>Track ID</key><integer>1468</integer>
            <key>Name</key><string>test name</string>
            <key>Artist</key><string>test Artist</string>
            <key>Album Artist</key><string>test Album Artist</string>
            <key>Composer</key><string>test Composer</string>
            <key>Album</key><string>test Album</string>
            <key>Genre</key><string>test Genre</string>
            <key>Year</key><integer>2010</integer>
        </dict>
    </dict>
</dict>
</plist>

<强>演示

import xml.etree.ElementTree as ET
#- Parser content by fromstring method of ElementTree
root = ET.fromstring(data)
#- Get all targeted dict tags from the content plist->dict->dict 
target_dicts = root.findall("./dict/dict")
#- Variable whoch store taget information.
resut_info = []
# Iterate target dict tag.
for i in target_dicts:
    #- Find dict tag from the dict tag.
    target_dict = i.find("dict")
    #- Get children of target dict tag.
    dict_children = target_dict.getchildren()
    #- Tmp variable which save all details.
    dict_details = {}
    #- Iterate over children.
    for j in dict_children:
        #- As structure well define so we can use following logic to get key and value.
        if j.tag=="key":
            tag_key = j.text
        else:
            dict_details[tag_key] = j.text

    resut_info.append({"Name":dict_details.get("Name", ""),\
                       "Artist":dict_details.get("Artist", ""),\
                       "Album":dict_details.get("Album", "")})

import pprint
pprint.pprint(resut_info)

输出:

[{'Album': 'My Beautiful Dark Twisted Fantasy',
  'Artist': 'Kanye West',
  'Name': 'Gorgeous (feat. Kid Cudi & Raekwon)'},
 {'Album': 'test Album', 'Artist': 'test Artist', 'Name': 'test name'}]