美丽的汤4不打印网页上的文字

时间:2014-04-28 06:19:17

标签: python networking encoding

我使用python 3.4和Beautiful Soup 4并请求。 我试图抓住网页,并使用美丽的汤打印文本。它可以抓取网页并打印标题,它甚至可以美化我是否提供编码,即utf-8,但是当我尝试从页面打印文本时,它会因为编码错误而烦恼。

from bs4 import BeautifulSoup
import requests

sparknotesSearch = requests.get("http://www.sparknotes.com/search?q=Sonnet")
soup = BeautifulSoup(sparknotesSearch.text)

print (soup.title)
#Can't print this?
print(soup.get_text())

我得到的错误/输出是:

<title>SparkNotes Search Results: sONNET</title>
Traceback (most recent call last):
  File "C:\Users\Cayle J. Elsey\Dropbox\Programming\Salient_Point\networking.py", line 10, in <module>
    print(soup.get_text())
  File "C:\Python34\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2192' in position 6238: character maps to <undefined>
[Finished in 0.5s]

1 个答案:

答案 0 :(得分:0)

只需将您的字符串编码为UTF-8即可。你的问题将得到解决

 html= soup.prettify()
   html=html.encode('UTF-8')