来自已解码字符串的Unicode错误?

时间:2014-11-29 14:29:49

标签: python json unicode python-requests

我在加入我之前在代码中已经解码过的字符串时遇到了问题:

import json
import requests
import jsonobject

for i in range(0, 3): #for loop to feed parameter to url params

    if i == 0:
        var = "0"
        var2 = "Home"
    elif i == 1:
        var = "1"
        var2 = "Away"
    elif i == 2:
        var = "2"
        var2 = "Overall"

    url = 'http://www.whoscored.com/StatisticsFeed/1/GetPlayerStatistics'
    params = {
            'category': 'tackles',
            'subcategory': 'success',
            'statsAccumulationType': '0',
            'isCurrent': 'true',
            'playerId': '',
            'teamIds': '',
            'matchId': '',
            'stageId': '9155',
            'tournamentOptions': '2',
            'sortBy': 'Rating',
            'sortAscending': '',
            'age': '',
            'ageComparisonType': '',
            'appearances': '',
            'appearancesComparisonType': '0',
            'field': var2, #from for loop
            'nationality': '',
            'positionOptions': "'FW','AML','AMC','AMR','ML','MC','MR','DMC','DL','DC','DR','GK','Sub'",
            'timeOfTheGameEnd': '5',
            'timeOfTheGameStart': '0',
            'isMinApp': '',
            'page': '1',
            'includeZeroValues': '',
            'numberOfPlayersToPick': '10'
            }

    headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36',
           'X-Requested-With': 'XMLHttpRequest',
           'Host': 'www.whoscored.com',
           'Referer': 'http://www.whoscored.com/'}

        responser = requests.get(url, params=params, headers=headers)
        responser = responser.json()
        playerTableStats = responser[u'playerTableStats']

    for statDict in playerTableStats:

        mylookup = ("{name},{firstName},{lastName},{positionText},{tournamentId},{tournamentShortName},{regionCode}"
            "{tournamentRegionId},{seasonId},{seasonName},{teamName},{teamId},{playerId}"
            "{minsPlayed},{ranking},{rating:.2f},{apps},{weight:.2f},{height:.2f},{playedPositions}"
            "{isManOfTheMatch},{isOpta},{subOn},".decode('cp1252').format(**statDict)) #generates none match data about players
        print mylookup


        mykey2 = (var2)
        print mykey2

        mykey3 = {}
        #create dynamic variables and join match and none match data together
        mykey3[mykey2] = ("{challengeLost:.2f},{tackleWonTotal:.2f},{tackleTotalAttempted:.2f},".decode('cp1252').format(**statDict))
        print mykey3[mykey2]
        mykey3[mykey2] = mykey3[mykey2],'*,'
        mykey3[mykey2] = str(''.join(mykey3[mykey2][0:2]))
        mykey3[mykey2] = mylookup,mykey3[mykey2]
        mykey3[mykey2] = str(''.join(mykey3[mykey2][0:2]))
        print mykey3[mykey2]

        mykey3[mykey2] = mykey3[mykey2],'*,'
        mykey3[mykey2] = str(''.join(mykey3[mykey2][0:2]))

我收到错误消息:

Traceback (most recent call last):
  File "C:\Python27\counter.py", line 72, in <module>
    mykey3[mykey2] = str(''.join(mykey3[mykey2][0:2]))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe0' in position 6: ordinal not in range(128)

在循环播放器名称列表中遇到名称Cesc Fàbregas时。我已经尝试修改上面的代码:

mykey3[mykey2] = mykey3[mykey2],'*,'
mykey3[mykey2] = str(''.join(mykey3[mykey2][0:2]).decode('cp1252'))

...或:

mykey3[mykey2] = mykey3[mykey2],'*,'
mykey3[mykey2] = str(''.join(mykey3[mykey2][0:2])).decode('cp1252')

...但是这仍然会产生同样的错误......

谁能看到我做错了什么?

1 个答案:

答案 0 :(得分:1)

您正尝试使用逗号以非常迂回的方式连接两个值,方法是创建一个元组,然后将元组转换回字符串。不要这样做,只需使用字符串格式。

您需要使用 Unicode文字而不是解码字符串:

mykey3[mykey2] = u"{challengeLost:.2f},{tackleWonTotal:.2f},{tackleTotalAttempted:.2f},".format(**statDict)

请注意字符串上的u前缀。您实际上并没有在字符串文字中使用任何非ASCII字符,因此您甚至不需要在那里声明编码。

但是你使用元组然后使用str()会导致你的异常。只是不要在这里使用str();您正在尝试将连接在一起的Unicode字符串再次转换为字节字符串,之后您尝试将该字节字符串与Unicode值连接起来,并再次转换为字节字符串 ,但失败了:< / p>

>>> mylookup = ("{name},{firstName},{lastName},{positionText},{tournamentId},{tournamentShortName},{regionCode}"
...             "{tournamentRegionId},{seasonId},{seasonName},{teamName},{teamId},{playerId}"
...             "{minsPlayed},{ranking},{rating:.2f},{apps},{weight:.2f},{height:.2f},{playedPositions}"
...             "{isManOfTheMatch},{isOpta},{subOn},".decode('cp1252').format(**statDict))
>>> ''.join(mykey3[mykey2][0:2])
u'Cesc F\xe0bregas,Cesc,F\xe0bregas,Midfielder,2,EPL,es252,4311,2014/2015,Chelsea,15,8040532,5,8.09,6,74.00,175.00,-FW-MC-ML-MR-False,True,0,2.83,1.17,4.00,*,*,2.83,1.17,4.00,*,'
>>> str(''.join(mykey3[mykey2][0:2]))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe0' in position 6: ordinal not in range(128)

请注意,联接恰好起作用;它是str()调用,它将Unicode转换回没有显式编解码器的字节字符串。

以下内容还使用逗号连接两个(Unicode)字符串:

mykey3[mykey2] = u','.join(mykey3[mykey2], u'*,')

或只是附加到现有字符串:

mykey3[mykey2] += u',*,'

或者只使用一个字符串格式化操作将所有数据放入一个字符串中开头:

mylookup = (
    u"{name},{firstName},{lastName},{positionText},{tournamentId},{tournamentShortName},{regionCode}"
    u"{tournamentRegionId},{seasonId},{seasonName},{teamName},{teamId},{playerId}"
    u"{minsPlayed},{ranking},{rating:.2f},{apps},{weight:.2f},{height:.2f},{playedPositions}"
    u"{isManOfTheMatch},{isOpta},{subOn},"
    u"{challengeLost:.2f},{tackleWonTotal:.2f},{tackleTotalAttempted:.2f},"
    u"*,*,".format(**statDict)
)