以下是完整的脚本:
import requests
import bs4
res = requests.get('https://example.com')
soup = bs4.BeautifulSoup(res.text, 'lxml')
page_HTML_code = soup.prettify()
multiline_code = """{}""".format(page_HTML_code)
f = open("testfile.txt","w+")
f.write(multiline_code)
f.close()
所以我试图将整个Downloaded HTML写成文件,同时保持整洁干净。
我确实知道文本有问题,无法保存某些字符,但我不确定如何正确编码文本。
有人可以帮忙吗?
这是我将收到的错误消息
"C:\Location", line 16, in <module>
f.write(multiline_code)
File "C:\\Anaconda3\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u0421' in position 209: character maps to <undefined>
答案 0 :(得分:1)
我做了一些挖掘,这有效:
import requests
import bs4
res = requests.get('https://example.com')
soup = bs4.BeautifulSoup(res.text, 'lxml')
page_HTML_code = soup.prettify()
multiline_code = """{}""".format(page_HTML_code)
#add the Encoding part when opening file and this did the trick
with open('testfile.html', 'w+', encoding='utf-8') as fb:
fb.write(multiline_code)