UnicodeEncodeError:字符映射到<undefined>

时间:2017-03-23 18:08:59

标签: python python-3.x beautifulsoup

我遇到了这个代码的问题我用Python 3.4在PyCharm中运行。 当我将它传递给BeautifulSoup时,变量html_text将停止运行(我使用BeautifulSoup4)。

错误消息是:

  

UnicodeEncodeError:&#39; charmap&#39;编解码器不能对字符&#39; \ ufffd&#39;进行编码。位置52793:字符映射到&lt; undefined&gt;

为什么会如此?如何解决?

import urllib.request
from bs4 import BeautifulSoup

url = 'http://nytimes.com'

urls = [url]  # stack of urls
visited = [url]  # already visited urls to avoid revisiting

while len(urls) > 0:
    try:
        html_text = urllib.request.urlopen(urls[0]).read()
    except:
        print(urls[0])
    soup = BeautifulSoup(html_text, 'html5lib')
    urls.pop(0)

0 个答案:

没有答案