运行此代码:
from bs4 import BeautifulSoup
soup = BeautifulSoup (open("my.html"))
print(soup.prettify())
产生此错误:
Traceback (most recent call last):
File "soup.py", line 5, in <module>
print(soup.prettify())
File "C:\Python33\lib\encodings\cp437.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u25ba' in position
9001: character maps to <undefined>
然后我尝试了:
print(soup.encode('UTF-8').prettify())
但是由于字节对象的字符串格式化而失败了:
Traceback (most recent call last):
File "soup.py", line 11, in <module>
print(soup.encode('UTF-8').prettify())
AttributeError: 'bytes' object has no attribute 'prettify'
不确定如何解决这个问题。任何意见都将不胜感激。
答案 0 :(得分:3)
您的(Windows)控制台正在使用cp437
编码,并且该编码不支持汤中的字符。默认情况下是在这种情况下抛出异常,但您可以更改它。
import sys,io
from bs4 import BeautifulSoup
sys.stdout = io.TextIOWrapper(sys.stdout.buffer,'cp437','backslashreplace')
soup = BeautifulSoup (open("my.html"))
print(soup.prettify())
或者,将汤写入文件并使用支持编码的编辑器阅读:
# On Windows, utf-8-sig will allow the file to be read by Notepad.
with open('out.txt','w',encoding='utf-8-sig') as f:
f.write(soup.prettify())