我正在尝试使用Python3脚本从网站内容中解析某些内容,并且遇到了'UnicodeEncodeError':
import urllib.request
myurl = "https://stackoverflow.com/"
with urllib.request.urlopen(myurl) as url:
html = url.read()
print(type(html))
content = html.decode("UTF-8", "ignore")
print(type(content))
print(content)
这将产生:
<class 'bytes'>
<class 'str'>
File "C:\Python3\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u200b' in position 688: character maps to <undefined>
现在,不是解码本身失败了(因为第二次打印调用仍然进行了),但是解码后的字符串仍然以某种方式仍然包含应该忽略的unicode字符?
我读错了the docs on this吗?