几分钟前我有这个网络抓取代码,但现在我得到了这个警告和编码。由于此请求不返回html,当我搜索标记的内容时,Beautifulsoup返回None类型。这里出了什么问题?我试图对这个编码问题进行谷歌搜索,但无法找到明确的答案。
import requests
from bs4 import BeautifulSoup
url = 'http://finance.yahoo.com/q?s=aapl&fr=uh3_finance_web&uhb=uhb2'
data = requests.get(url)
soup = BeautifulSoup(data.content).text
print(data)
结果如下:
0.0 seconds
WARNING:root:Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.
<Response [200]>
WARNING:root:Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.
<Response [200]>
WARNING:root:Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.
<Response [200]>
WARNING:root:Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.
<Response [200]>
WARNING:root:Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.
<Response [200]>
WARNING:root:Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.
<Response [200]>
WARNING:root:Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.
<Response [200]>
WARNING:root:Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.
<Response [200]>
WARNING:root:Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.
<Response [200]>
WARNING:root:Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.
<Response [200]>
{}
Process finished with exit code 0
答案 0 :(得分:4)
下面的Beautifulsoup的构造函数为我工作:
soup = BeautifulSoup(open(html_path, 'r'),"html.parser",from_encoding="iso-8859-1")
答案 1 :(得分:0)
response = urlopen(notiurl)
html = response.read().decode(encoding="iso-8859-1")
soup = BeautifulSoup(html, 'html.parser')
检查编码---> print(soup.original_encoding)
文档-------- https://www.crummy.com/software/BeautifulSoup/bs4/doc/#encodings