Question

我需要删除“人们也要问框”：问题和答案。

我在Google上进行搜索，然后用bueatifulsoup将其报废。

import requests
from bs4 import BeautifulSoup
import html2text
import urllib.request

link = "https://www.google.com/search?client=firefox-b-d&source=hp&ei=v0mUXPu2ApTljwS6iLnABA&ei=lAyVXMPFCsaUsgXqmZT4DQ&q=what+is+java"

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
page = requests.get(link ,headers = headers)
soup = BeautifulSoup(page.content, 'html.parser')
#For answers :
mydivs = soup.find_all('div', class_="ILfuVd NA6bn")

结果为空列表。我签入了html文件，答案实际上在该类下

Answer 1

在搜索框中输入文字时，Google的首页会更新，因此在向搜索页面进行简单请求时将无法获得结果。

您可以在浏览器中转到https://google.com，打开“开发工具”面板（通常为F12），然后查看“网络”标签，查看对自动填充API提出的基础请求。

看起来端点是https://www.google.com/complete/search?q=yourQueryHere&client=psy-ab，它比HTML页面更易于查询：

query = "what is java"
res = requests.get("https://google.com/complete/search?client=psy-ab&q=" + query)
print(res)

此外，Google可能不希望人们放弃它，因此如果您进行过多请求，您可能会遇到限速的情况。

Answer 2

people-also-ask可能会对您有所帮助。

pip install people-also-ask

用法示例：

people_also_ask.get_related_questions("coffee", 5)

['How did coffee originate?',
    'Is coffee good for your health?',
  'Who brought coffee America?',
    'Who invented coffee?',
    'Why is coffee bad for you?',
    'Why is drinking coffee bad for you?']

Answer 3

要获得答案，您可以使用 ConversationHandler click 方法或其他可以模拟点击的库。
直接从 Javascript 中提取：
使用来自 SerpApi 的 Google Related Questions API。这是一个免费试用的付费 API。检查playground。

代码和example：

selenium

输出：

from serpapi import GoogleSearch
import os

params = {
  "engine": "google",
  "q": "what is java",
  "api_key": os.getenv("API_KEY"),
}

search = GoogleSearch(params)
results = search.get_dict()

for q_and_a in results['related_questions']:
  print(f"Question: {q_and_a['question']}\nAnswer: {q_and_a['snippet']}\n")

<块引用>

免责声明，我为 SerpApi 工作。

如何从Google搜索中删除“人们也问”框？

3 个答案:

输出：