来自已解析网站(Python3)的UnicodeEncodeError

时间:2018-09-07 08:28:52

标签: python-3.x unicode

我正在尝试使用Python3脚本从网站内容中解析某些内容,并且遇到了'UnicodeEncodeError':

import urllib.request

myurl = "https://stackoverflow.com/"
with urllib.request.urlopen(myurl) as url:
    html = url.read()
    print(type(html))
    content = html.decode("UTF-8", "ignore")
    print(type(content))
    print(content)

这将产生:

<class 'bytes'>
<class 'str'>
  File "C:\Python3\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u200b' in position 688: character maps to <undefined>

现在,不是解码本身失败了(因为第二次打印调用仍然进行了),但是解码后的字符串仍然以某种方式仍然包含应该忽略的unicode字符?

我读错了the docs on this吗?

0 个答案:

没有答案