Python 2.7.2:带有itunes xml的plistlib

时间:2013-08-16 13:55:08

标签: python xml python-2.7 unicode itunes

我正在阅读带有plistib的itunes生成的xml播放列表。 xml有一个utf8标题。

当我用plistib读取xml时,我得到unicode(例如,'Name':u'Don \ u2019t You Remember')和字节字符串(例如,'Name':'Where Eagles Dare')。

标准建议是尽快使用正确的编码解码您所读取的内容,并在程序中使用unicode。但是,

unicode_string.decode('utf8') 

失败(应该如此)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 3: ordinal not in range(128)

解决方案似乎是:

for name in names:
    if isinstance(name, str):
        name = name.decode('utf8')
    # etc.

这是处理问题的正确方法吗?还有更好的方法吗?

我在Windows 7上。

编辑:

xml读取:

import plistlib
xml = plistlb.readPlist(fn)
for track in xml['Tracks']:
    info = xml['Tracks'][track]
    info['Name']

闲置生产:

u'Don\u2019t You Remember'
'Where Eagles Dare'

这是xml文件:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Major Version</key><integer>1</integer>
    <key>Minor Version</key><integer>1</integer>
    <key>Date</key><date>2013-08-14T15:04:27Z</date>
    <key>Application Version</key><string>10.6.3</string>
    <key>Features</key><integer>5</integer>
    <key>Show Content Ratings</key><true/>
    <key>Music Folder</key><string>file://localhost/C:/Users/rdp/Music/iTunes/iTunes%20Media/</string>
    <key>Library Persistent ID</key><string>FE28CCACD9A36C34</string>
    <key>Tracks</key>
    <dict>
        <key>1019</key>
        <dict>
            <key>Track ID</key><integer>1019</integer>
            <key>Name</key><string>Where Eagles Dare</string>
            <key>Artist</key><string>Iron Maiden</string>
            <key>Album</key><string>Piece Of Mind</string>
            <key>Genre</key><string>Rock</string>
            <key>Kind</key><string>MPEG audio file</string>
            <key>Size</key><integer>7372755</integer>
            <key>Total Time</key><integer>370128</integer>
            <key>Track Number</key><integer>1</integer>
            <key>Year</key><integer>1983</integer>
            <key>Date Modified</key><date>2009-10-07T21:11:31Z</date>
            <key>Date Added</key><date>2008-02-07T16:04:15Z</date>
            <key>Bit Rate</key><integer>153</integer>
            <key>Sample Rate</key><integer>44100</integer>
            <key>Play Count</key><integer>4</integer>
            <key>Play Date</key><integer>3414416760</integer>
            <key>Play Date UTC</key><date>2012-03-12T21:06:00Z</date>
            <key>Artwork Count</key><integer>1</integer>
            <key>Persistent ID</key><string>FE28CCACD9A383E5</string>
            <key>Track Type</key><string>File</string>
            <key>Location</key><string>file://localhost/D:/music/Iron%20Maiden/Piece%20Of%20Mind/01%20Where%20Eagles%20Dare.mp3</string>
            <key>File Folder Count</key><integer>-1</integer>
            <key>Library Folder Count</key><integer>-1</integer>
        </dict>
        <key>11559</key>
        <dict>
            <key>Track ID</key><integer>11559</integer>
            <key>Name</key><string>Don’t You Remember</string>
            <key>Artist</key><string>Adele</string>
            <key>Album</key><string>21</string>
            <key>Genre</key><string>Pop</string>
            <key>Kind</key><string>MPEG audio file</string>
            <key>Size</key><integer>6120028</integer>
            <key>Total Time</key><integer>229511</integer>
            <key>Track Number</key><integer>4</integer>
            <key>Track Count</key><integer>11</integer>
            <key>Year</key><integer>2011</integer>
            <key>Date Modified</key><date>2012-11-17T10:50:31Z</date>
            <key>Date Added</key><date>2012-12-19T16:03:46Z</date>
            <key>Bit Rate</key><integer>199</integer>
            <key>Sample Rate</key><integer>44100</integer>
            <key>Artwork Count</key><integer>1</integer>
            <key>Persistent ID</key><string>7130C888606FB153</string>
            <key>Track Type</key><string>File</string>
            <key>Location</key><string>file://localhost/D:/music/Adele/21/04%20-%20Don%E2%80%99t%20You%20Remember.mp3</string>
            <key>File Folder Count</key><integer>-1</integer>
            <key>Library Folder Count</key><integer>-1</integer>
        </dict>
    </dict>
    <key>Playlists</key>
    <array>
        <dict>
            <key>Name</key><string>short</string>
            <key>Playlist ID</key><integer>30888</integer>
            <key>Playlist Persistent ID</key><string>166746C6572B0005</string>
            <key>All Items</key><true/>
            <key>Playlist Items</key>
            <array>
                <dict>
                    <key>Track ID</key><integer>11559</integer>
                </dict>
                <dict>
                    <key>Track ID</key><integer>1019</integer>
                </dict>
            </array>
        </dict>
    </array>
</dict>
</plist>

1 个答案:

答案 0 :(得分:1)

哇,这是一个非常奇怪的行为。我甚至会说这种非均匀行为是plistlib的2.X实现中的一个错误。 Python 3中的plistlib总是返回更好的unicode字符串。

但你必须忍受它:)所以你的问题的答案是肯定的。从plist

读取字符串时,您应该始终保护自己
def safe_unicode(s):
    if isinstance(s, unicode):
        return s
    return s.decode('utf-8', errors='replace')

value = safe_unicode(info['Name'])

我添加了errors='replace',以防字符串不是utf-8编码的。如果无法解码,您将获得一堆\ufffd个字符。如果您想要获得例外,请将其删除并使用e.decode('utf-8')

更新

当我尝试使用ElementTree时:

from xml.etree import ElementTree as et
tree = et.parse('test.plist')
map(lambda x: x.text, tree.findall('dict/dict/dict')[1].findall('string'))

哪位给了我:

[u'Don\u2019t You Remember',
 'Adele',
 '21',
 'Pop',
 'MPEG audio file',
 '7130C888606FB153',
 'File',
 'file://localhost/D:/music/Adele/21/04%20-%20Don%E2%80%99t%20You%20Remember.mp3']

所以混合了unicode和字节串: - /