我需要删除“人们也要问框”:问题和答案。
我在Google上进行搜索,然后用bueatifulsoup将其报废。
import requests
from bs4 import BeautifulSoup
import html2text
import urllib.request
link = "https://www.google.com/search?client=firefox-b-d&source=hp&ei=v0mUXPu2ApTljwS6iLnABA&ei=lAyVXMPFCsaUsgXqmZT4DQ&q=what+is+java"
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
page = requests.get(link ,headers = headers)
soup = BeautifulSoup(page.content, 'html.parser')
#For answers :
mydivs = soup.find_all('div', class_="ILfuVd NA6bn")
结果为空列表。我签入了html文件,答案实际上在该类下
答案 0 :(得分:0)
在搜索框中输入文字时,Google的首页会更新,因此在向搜索页面进行简单请求时将无法获得结果。
您可以在浏览器中转到https://google.com,打开“开发工具”面板(通常为F12),然后查看“网络”标签,查看对自动填充API提出的基础请求。
看起来端点是https://www.google.com/complete/search?q=yourQueryHere&client=psy-ab,它比HTML页面更易于查询:
query = "what is java"
res = requests.get("https://google.com/complete/search?client=psy-ab&q=" + query)
print(res)
此外,Google可能不希望人们放弃它,因此如果您进行过多请求,您可能会遇到限速的情况。
答案 1 :(得分:0)
people-also-ask可能会对您有所帮助。
pip install people-also-ask
用法示例:
people_also_ask.get_related_questions("coffee", 5)
['How did coffee originate?',
'Is coffee good for your health?',
'Who brought coffee America?',
'Who invented coffee?',
'Why is coffee bad for you?',
'Why is drinking coffee bad for you?']
答案 2 :(得分:0)
ConversationHandler
click 方法或其他可以模拟点击的库。代码和example:
selenium
from serpapi import GoogleSearch
import os
params = {
"engine": "google",
"q": "what is java",
"api_key": os.getenv("API_KEY"),
}
search = GoogleSearch(params)
results = search.get_dict()
for q_and_a in results['related_questions']:
print(f"Question: {q_and_a['question']}\nAnswer: {q_and_a['snippet']}\n")
<块引用>
免责声明,我为 SerpApi 工作。