Question

我试图使用请求和BeautifulSoup4软件包来搜索网站。

>>>import requests
>>>from bs4 import BeautifulSoup

>>>r = requests.get('https://www.yellowpages.com/search?search_terms = coffee&geo_location_terms=Los+Angeles%2C+CA')

>>>r.content #shows source code (mess) bytes type

>>>soup = BeautifulSoup(r.content,'html.parser')

当我尝试使用

美化并显示页面的html代码时

print(soup.prettify())

我收到错误

UnicodeEncodeError: 'charmap' codec can't decode the character '\u2013'
in position 44379: character maps to <undefined>

我也试过

>>>soupbytes = soup.prettify(encoding = 'utf-8') #this is bytes format
>>>soupstr = soupbytes.decode('utf-8') #this is str format

对于第一个我没有打印任何问题（print(soupbytes)），但它没有打印文本“漂亮”，它是字节格式。如果我尝试打印第二个（print(soupstr)）我再次得到错误，但我得到str类型的对象。

我还要说，我在IDE（spyder）中没有收到任何错误。可以这么说，如果我在spyder中运行下一个代码：

import requests
from bs4 import BeautifulSoup

r = requests.get('https://www.yellowpages.com/search?
search_terms=coffee&geo_location_terms=Los+Angeles%2C+CA')

r.content #muestra html de la pagina
soup = BeautifulSoup(r.content,'html.parser')
print(soup.prettify())

我没有任何错误，打印效果很好。为什么会有这种差异？我怎么能避免终端中的错误???

.prettify（）python 3

0 个答案: