使用python下载带有URL的HTML时的编码错误

时间:2017-05-30 18:07:51

标签: python html encoding

运行python代码时遇到问题:

import requests
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
#url1='https://www.nytimes.com/store/west-side-highway-and-piers-manhattan-1937-nypl482645-nypl482645p.html'
url2='https://www.nytimes.com/1978/06/21/archives/jordan-wary-of-interim-role-in-west-bank-and-gaza-jordan-accepted.html'
response = requests.get(url, headers=headers)
fileout="outputTest.html"
obj=open(fileout,"w")
obj.write(response.text)
obj.close()

使用url2时从URL下载HTML并显示错误(适用于url1)。

return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2010' in position 34060: character maps to <undefined>

如何修复url2的错误?

1 个答案:

答案 0 :(得分:0)

使用

obj.write(str(response.text.encode('utf-8')))

而不是

obj.write(response.text)