Question

早上好。我正试图这样做，不要离开我。

你能帮助我吗？

非常感谢

 soup = BeautifulSoup(html_page)
           titulo=soup.find('h3').get_text()
      titulo=titulo.replace('§','')

 titulo=titulo.replace('§','')
 UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0:       ordinal not in range(128)

Answer 1

Define the coding并使用 unicode字符串：

进行操作

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup

html_page = u"<h3>§ title here</h3>"

soup = BeautifulSoup(html_page, "html.parser")

titulo = soup.find('h3').get_text()
titulo = titulo.replace(u'§', '')
print(titulo)

打印title here。

Answer 2

我会清楚地解释你的问题：

默认情况下，Python无法识别“à”或“ò”等特定字符。要让Python识别出你必须放在脚本顶部的那些字符：

# -*- coding: utf-8 -*-

此代码使Python识别默认情况下无法识别的特定字符。使用编码的另一种方法是使用“sys”库：

# sys.setdefaultencoding() does not exist, here!
import sys
reload(sys)  #This reloads the sys module
sys.setdefaultencoding('UTF8') #Here you choose the encoding

§符号无法识别

2 个答案: