Urllib BeautifulSoup python 3.3页面特定错误

时间:2015-01-07 12:00:41

标签: python-3.x beautifulsoup urllib

from urllib.request import urlopen 
from bs4 import BeautifulSoup
content = urlopen("http://en.wikipedia.org/wiki/List_of_human_stampedes")
soup = BeautifulSoup(content)
print(soup.get_text())
print(soup.prettify())

错误:

Traceback (most recent call last):
  File "C:\Users\sony\Desktop\Trash\Crawler Try\try3.py", line 5, in <module>
    print(soup.get_text())
  File "C:\Python34\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u014d' in position 10487: character maps to <undefined>
[Finished in 2.1s with exit code 1]

似乎是特定于页面的例如。我得到这个,以防http://www.quora.com

替换网址

0 个答案:

没有答案