我试图使用请求和BeautifulSoup4软件包来搜索网站。
>>>import requests
>>>from bs4 import BeautifulSoup
>>>r = requests.get('https://www.yellowpages.com/search?search_terms = coffee&geo_location_terms=Los+Angeles%2C+CA')
>>>r.content #shows source code (mess) bytes type
>>>soup = BeautifulSoup(r.content,'html.parser')
当我尝试使用
美化并显示页面的html代码时 print(soup.prettify())
我收到错误
UnicodeEncodeError: 'charmap' codec can't decode the character '\u2013'
in position 44379: character maps to <undefined>
我也试过
>>>soupbytes = soup.prettify(encoding = 'utf-8') #this is bytes format
>>>soupstr = soupbytes.decode('utf-8') #this is str format
对于第一个我没有打印任何问题(print(soupbytes)
),但它没有打印文本“漂亮”,它是字节格式。如果我尝试打印第二个(print(soupstr)
)我再次得到错误,但我得到str类型的对象。
我还要说,我在IDE(spyder)中没有收到任何错误。可以这么说,如果我在spyder中运行下一个代码:
import requests
from bs4 import BeautifulSoup
r = requests.get('https://www.yellowpages.com/search?
search_terms=coffee&geo_location_terms=Los+Angeles%2C+CA')
r.content #muestra html de la pagina
soup = BeautifulSoup(r.content,'html.parser')
print(soup.prettify())
我没有任何错误,打印效果很好。 为什么会有这种差异?我怎么能避免终端中的错误???