我有这样的网址:
http://idebate.org/debatabase/debates/constitutional-governance/house-supports-dalai-lama%E2%80%99s-%E2%80%98third-way%E2%80%99-tibet
然后我在python中使用以下脚本来解码这个url:
full_href = urllib.unquote(full_href.encode('ascii')).decode('utf-8')
然而,我得到了这样的错误:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 89: ordinal not in range(128)
尝试写入文件时
答案 0 :(得分:0)
就像@ KevinJ.Chase指出的那样,你很可能试图用不兼容的ascii格式写一个带有字符串的文件。
您可以更改写文件编码,也可以将full_href
编码为ascii
,如下所示:
# don't decode again to utf-8
full_href = urllib.unquote(url.encode('ascii'))
... then write to your file stream
,或者
...
# encode your your to compatible encoding on write, ie. utf-8
with open('yourfilenamehere', 'w') as f:
f.write(full_href.encode('utf-8'))