Question

您好，我正在编写抓取代码但是当我尝试从网站上获取所有段落时，它会给我以下错误 Unicode编码错误：Charmap无法编码字符＆＃39; \ xa9＆＃39;

这是我的代码：

＆＃13;

#Loading Libraries
import urllib
from urllib.parse import urlparse
from urllib.parse import urljoin
import urllib.request
from bs4 import BeautifulSoup

#define URL for scraping
newsurl = "http://www.techspot.com/news/67832-netflix-exceeds-growth-expectations-home-abroad-stock-soars.html"
thepage = urllib.request.urlopen(newsurl)

soup = BeautifulSoup(thepage ,"html.parser")

article = soup.find_all('div' , {'class','articleBody'})

for pg in article:
	paragraph = soup.findAll('p')
	ptag = paragraph
	print(ptag)

＆＃13;

＆＃13; 我得到的错误如下：

让我告诉我如何删除此错误

Answer 1

soup.findAll（）返回一个ResultSet对象，该对象基本上是一个没有属性encode的列表。您要么使用.text代替：

text = soup.text

或者，“加入”文本：

text = "".join(soup.findAll(whatever, you, want))

Answer 2

有时在使用 Beautiful Soup 4 或 bs4 或使用 getData requests 或 command 时会发生此错误。因此，请尝试将下面提到的代码与您的打印语句一起使用。打印（myHtmlData.encode（“utf-8”））

Unicode编码错误：Charmap无法在Python中编码character \ xa9

2 个答案: