.prettify()python 3

时间:2017-04-06 16:34:37

标签: python python-3.x unicode beautifulsoup

我试图使用请求和BeautifulSoup4软件包来搜索网站。

>>>import requests
>>>from bs4 import BeautifulSoup

>>>r = requests.get('https://www.yellowpages.com/search?search_terms = coffee&geo_location_terms=Los+Angeles%2C+CA')

>>>r.content #shows source code (mess) bytes type

>>>soup = BeautifulSoup(r.content,'html.parser')

当我尝试使用

美化并显示页面的html代码时

print(soup.prettify())

我收到错误

UnicodeEncodeError: 'charmap' codec can't decode the character '\u2013'
in position 44379: character maps to <undefined>

我也试过

>>>soupbytes = soup.prettify(encoding = 'utf-8') #this is bytes format
>>>soupstr = soupbytes.decode('utf-8') #this is str format

对于第一个我没有打印任何问题(print(soupbytes)),但它没有打印文本“漂亮”,它是字节格式。如果我尝试打印第二个(print(soupstr))我再次得到错误,但我得到str类型的对象。

我还要说,我在IDE(spyder)中没有收到任何错误。可以这么说,如果我在spyder中运行下一个代码:

import requests
from bs4 import BeautifulSoup

r = requests.get('https://www.yellowpages.com/search?
search_terms=coffee&geo_location_terms=Los+Angeles%2C+CA')

r.content #muestra html de la pagina
soup = BeautifulSoup(r.content,'html.parser')
print(soup.prettify())

我没有任何错误,打印效果很好。 为什么会有这种差异?我怎么能避免终端中的错误???

0 个答案:

没有答案