我在使用python beautifulsoup进行网页报废时遇到错误

时间:2016-04-25 13:42:27

标签: python beautifulsoup python-requests

我收到了以下错误。

Traceback (most recent call last):File "ex1.py", line 9, in <module>
    print(soup.prettify())
  File "C:\Python34\lib\encodings\cp437.py", line 19, in encodereturn
    codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2013' in position35013: character maps to <undefined>

我的源代码如下:

import requests
from bs4 import BeautifulSoup

url = 'http://www.yellowpages.com/search?search_terms=coffee&geo_location_terms=Los+Angeles%2C+CA'
response = requests.get(url)
html = response.content

soup = BeautifulSoup(html, "html.parser")
print(soup.prettify())

2 个答案:

答案 0 :(得分:0)

更改为在我的工作

html = response.text
soup = BeautifulSoup(html)
print soup.prettify()

答案 1 :(得分:0)

你在Windows上运行吗?导致问题是由于您的html内容的编码。

我认为这可行:

import requests
from bs4 import BeautifulSoup

url = 'http://www.yellowpages.com/search?search_terms=coffee&geo_location_terms=Los+Angeles%2C+CA'
response = requests.get(url)
html = response.content

soup = BeautifulSoup(html, "html.parser")
print(soup.prettify().encode('UTF-8'))

传递prettify()上的编码参数也应该有用。像这样:

soup.prettify(encoding='utf-8')