我正尝试通过以下代码进行网络抓取:
from bs4 import BeautifulSoup
import requests
import pandas as pd
page = requests.get('https://www.google.com/search?q=phagwara+weather')
soup = BeautifulSoup(page.content, 'html-parser')
day = soup.find(id='wob_wc')
print(day.find_all('span'))
但是不断出现以下错误:
File "C:\Users\myname\Desktop\webscraping.py", line 6, in <module>
soup = BeautifulSoup(page.content, 'html-parser')
File "C:\Users\myname\AppData\Local\Programs\Python\Python38-32\lib\site-packages\bs4\__init__.py", line 225, in __init__
raise FeatureNotFound(
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: html-parser. Do you need to install a parser library?
我安装了 lxml和html5lib ,但此问题仍然存在。
答案 0 :(得分:2)
您需要将'html-parser'更改为soup = BeautifulSoup(page.content, 'html.parser')
答案 1 :(得分:0)
您需要提及标记,因此它应该是soup.find(id="wob_wc")
而不是soup.find("div", id="wob_wc"))
解析器名称是html.parser
而不是html-parser
,区别是点。
默认情况下,Google
通常也会给您一个200
的响应,以防止您了解是否阻止。通常您需要检查r.content
。
我加入了headers
,现在可以使用了。
import requests
from bs4 import BeautifulSoup
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:74.0) Gecko/20100101 Firefox/74.0'}
r = requests.get(
"https://www.google.com/search?q=phagwara+weather", headers=headers)
soup = BeautifulSoup(r.content, 'html.parser')
print(soup.find("div", id="wob_wc"))