我遇到了这个代码的问题我用Python 3.4在PyCharm中运行。
当我将它传递给BeautifulSoup时,变量html_text
将停止运行(我使用BeautifulSoup4)。
错误消息是:
UnicodeEncodeError:' charmap'编解码器不能对字符' \ ufffd'进行编码。位置52793:字符映射到< undefined>
为什么会如此?如何解决?
import urllib.request
from bs4 import BeautifulSoup
url = 'http://nytimes.com'
urls = [url] # stack of urls
visited = [url] # already visited urls to avoid revisiting
while len(urls) > 0:
try:
html_text = urllib.request.urlopen(urls[0]).read()
except:
print(urls[0])
soup = BeautifulSoup(html_text, 'html5lib')
urls.pop(0)