Question

该页面以UTF-8编码，并且使用python的HTMLParser，它运行良好，没有UnicodeDecodeError，但是当我尝试使用BeautifulSoup解析它时，我确实收到错误。我已尝试_*_编码：utf-8 _*_，.encode('utf-8')无处不在，我仍然收到错误

import urllib
from BeautifulSoup import BeautifulSoup
args=urllib.urlencode({'keywords':'magic'})
doc=urllib.urlopen('http://www.example.com/submit', args)
soup=BeautifulSoup(doc)
stuff = soup.findAll('section',id='banner')
print stuff

Traceback (most recent call last):
      File "test.py", line 7, in <module>
        print stuff
    UnicodeEncodeError: 'ascii' codec can't encode character u'\xed' in position 112: ordinal not in range(128)

Answer 1

好的，我在上一次尝试中找到了解决方案，也许它可以帮助其他人解决同样的问题。它需要编码，而不是解码

print( [e.encode('utf-8', 'ignore') for e in stuff] )

Answer 2

打印时不应出现UnicodeEncodeError: 'ascii'..错误。如果您的locale已损坏或设为C，通常会导致此问题。然后，Python无法在stdout流上设置适当的编码器。

运行locale并检查错误或警告。

如果无法修复语言环境，通常可以通过在环境中将PYTHONIOENCODING设置为与终端仿真匹配的编码来覆盖Python的stdout编码器。通常你可以通过：

export PYTHONIOENCODING=UTF-8

或

PYTHONIOENCODING=UTF-8 python my_script.py

使用BeautifulSoup在python中使用编码

2 个答案: