Question

我是一名蟒蛇初学者。我编写了如下代码：

from bs4 import BeautifulSoup
import requests

url = "http://www.google.com"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
links = soup.find_all("a")
for link in links:
    print(link.text)

在windows powershell中运行此.py文件时，print（link.text）会导致以下错误。

error: UnicodeEncodeError: 'gbk' codec can't encode charactor '\xbb' in position 5: 
illegal multibyte sequence.

我知道这个错误是由一些中文字符引起的，好像我应该使用＆＃39; decode＆＃39;或者＆＃39;忽略＆＃39;但我不知道如何修复我的代码。请帮忙！谢谢！

Answer 1

如果您不想显示这些特殊字符：
您可以通过以下方式忽略它们：

print(link.text.encode(errors="ignore"))

Answer 2

您可以在utf8。

中对字符串进行编码

for link in links:
    print(link.text.encode('utf8'))

但更好的方法是：

response = requests.get(url)
soup = BeautifulSoup(response.text.encode("utf8"), "html.parser")

要了解您所面临的问题的更多信息，请查看此stackoverflow answer。

错误：UnicodeEncodeError：＆＃39; gbk＆＃39;编解码器不能编码字符

2 个答案: