python:尝试解析列表时获取UnicodeEncodeError

时间:2014-07-08 03:15:08

标签: python unicode ascii

尝试通过http://www.ropeofsilicon.com/roger-eberts-great-movies-list/处的http://www.omdbapi.com/来管理我从{{3}}抓取的列表,以获取他们的IMDB ID。

为我能够找到的电影创建日志记录,如下所示:

import requests

OMDBPath = "http://www.omdbapi.com/"

movieFile = open("movies.txt")
foundLog = open("log_found.txt", 'w')
notFoundLog = open("log_not_found.txt", 'w')

####

for line in movieFile:
    name = line.split('(')[0].decode('utf8')
    print name
    year = False
    if line.find('(') != -1:
        year = line[line.find('(')+1 : line.find(')')].decode('utf8')
        OMDBQuery = {'t': name, 'y': year}
    else:
        OMDBQuery = {'t': name}

    req = requests.get(OMDBPath, params=OMDBQuery)
    if req.json()[u'Response'] == "False":
        if year:
            notFoundLog.write("Couldn't find " + name + " (" + year + ")" + "\n")
        else:
            notFoundLog.write("Couldn't find " + name + "\n")
    # else:
    #     print req.json()
    #     foundLog.write(req.text.decode('utf8').encode('latin1') + ",")
movieFile.close()
foundLog.close()
notFoundLog.close()

已经阅读了很多关于unicode编码和解码的内容,看起来这种情况正在发生,因为我没有以正确的方式编码文件?不知道这里有什么问题,当我到达“Caché”时遇到问题:

Caché
Traceback (most recent call last):
  File "app.py", line 34, in <module>
    notFoundLog.write("Couldn't find " + name + " (" + year + ")" + "\n")
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 18: ordinal not in range(128)

1 个答案:

答案 0 :(得分:1)

这是一个有效的解决方案,依赖于codecs模块为您打开的各种文件提供/ utf-8透明编码/解码:

import requests
import codecs

OMDBPath = "http://www.omdbapi.com/"

with codecs.open("movies.txt", encoding='utf-8') as movieFile, \
     codecs.open("log_found.txt", 'w', encoding='utf-8') as foundLog, \
     codecs.open("log_not_found.txt", 'w', encoding='utf-8') as notFoundLog:
    for line in movieFile:
        name = line.split('(')[0]
        print(name)
        year = False
        if line.find('(') != -1:
            year = line[line.find('(')+1 : line.find(')')]
            OMDBQuery = {'t': name, 'y': year}
        else:
            OMDBQuery = {'t': name}

        req = requests.get(OMDBPath, params=OMDBQuery)
        if req.json()[u'Response'] == "False":
            if year:
                notFoundLog.write(u"Couldn't find {} ({})\n".format(name, year))
            else:
                notFoundLog.write(u"Couldn't find {}\n".format(name))
        #else:
            #print(req.json())
            #foundLog.write(u"{},".format(req.text))

请注意,只有Python 2.x才需要使用codecs模块。在Python 3.x中,内置的open函数应默认正确处理。