Python无法解码网站UnicodeEncodeError

时间:2014-01-15 17:23:26

标签: python decode urllib

我在使用python中的解码时遇到问题,我正在尝试获取一个IMDB网站(示例地址:http://www.imdb.com/title/tt2216240/):

req = urllib.request.Request(address)
response = urllib.request.urlopen(req)
page = response.read().decode('utf-8', 'ignore')
with open('film.html', 'w') as f:
    print(page, file=f)

我收到错误:

UnicodeEncodeError: 'charmap' codec can't encode character '\xe6' in position 4132: character maps to <undefined>

2 个答案:

答案 0 :(得分:0)

尝试明确指定utf-8文件编码:

with open('film.html', 'w', encoding='utf-8') as f:
    print(page, file=f)

答案 1 :(得分:0)

是否已使用requests库?

无论如何,它制作了simpler

#samplerequest.py
import requests

address = "http://www.imdb.com/title/tt2216240/"
req = requests.get(address)

print req.text
print req.encoding