如何从Google搜索中获取“反馈”框的内容?

时间:2016-05-23 02:48:04

标签: python-3.x beautifulsoup python-requests google-search google-search-api

当您在Google搜索中提出问题或要求定义单词时,Google会在“反馈”框中为您提供答案摘要。

例如,当您搜索define apple时,您会收到以下结果:

example of feedback

现在,我想说明我不需要整个页面或其他结果,我只需要这个框:

highlighted example of feedback

如何使用RequestsBeautiful Soup模块在​​Python 3中获取此“feedback”框的内容?

如果无法做到这一点,我可以使用Google搜索功能显示来获取“反馈”框的内容吗?

我在SO上找到了similar question,但是OP没有指定语言,没有答案,我担心这两个评论已经过时,因为这个问题是在近9个月前提出来的。

感谢您的时间和提前帮忙。

3 个答案:

答案 0 :(得分:2)

使用 requests bs4 轻松完成,您只需要使用类 lr_dct_ent从 div 中提取文本

import requests
from bs4 import BeautifulSoup

h = {"User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36"}
r = requests.get("https://www.google.ie/search?q=define+apple", headers=h).text
soup = BeautifulSoup(r)

print("\n".join(soup.select_one("div.lr_dct_ent").text.split(";")))

主要文本位于有序列表中,名词位于带有 lr_dct_sf_h 类的div中:

In [11]: r = requests.get("https://www.google.ie/search?q=define+apple", headers=h).text
In [12]: soup = BeautifulSoup(r,"lxml")    
In [13]: div = soup.select_one("div.lr_dct_ent")    
In [14]: n_v = div.select_one("div.lr_dct_sf_h").text   
In [15]: expl = [li.text for li in div.select("ol.lr_dct_sf_sens li")]    
In [16]: print(n_v)
noun

In [17]: print("\n".join(expl))
1. the round fruit of a tree of the rose family, which typically has thin green or red skin and crisp flesh.used in names of unrelated fruits or other plant growths that resemble apples in some way, e.g. custard apple, oak apple.
used in names of unrelated fruits or other plant growths that resemble apples in some way, e.g. custard apple, oak apple.
2. the tree bearing apples, with hard pale timber that is used in carpentry and to smoke food.

答案 1 :(得分:0)

问题是个好主意

程序可以启动 python3 defineterm.py apple

#! /usr/bin/env python3.5
# defineterm.py

import requests
from bs4 import BeautifulSoup
import sys
import html
import codecs

searchterm = ' '.join(sys.argv[1:])

url = 'https://www.google.com/search?q=define+' + searchterm
res = requests.get(url)
try:
    res.raise_for_status()
except Exception as exc:
    print('error while loading page occured: ' + str(exc))

text = html.unescape(res.text)
soup = BeautifulSoup(text, 'lxml')
prettytext = soup.prettify()

#next lines are for analysis (saving raw page), you can comment them
frawpage = codecs.open('rawpage.txt', 'w', 'utf-8')
frawpage.write(prettytext)
frawpage.close()

firsttag = soup.find('h3', class_="r")
if firsttag != None:
    print(firsttag.getText())
    print()

#second tag may be changed, so check it if not returns correct result. That might be situation for all searched tags.
secondtag = soup.find('div', {'style': 'color:#666;padding:5px 0'})
if secondtag != None:
    print(secondtag.getText())
    print()

termtags = soup.findAll("li", {"style" : "list-style-type:decimal"})

count = 0
for tag in termtags:
    count += 1
    print( str(count)+'. ' + tag.getText())
    print()

将脚本设为可执行文件

然后在〜/ .bashrc中 可以添加此行

alias defterm="/data/Scrape/google/defineterm.py "

为您的地点编写正确的路径

然后执行

source ~/.bashrc

程序可以通过以下方式启动:

defterm apple (or other term)

答案 2 :(得分:0)

最简单的方法是使用 SelectorGadget 获取此文本的 CSS 选择器。

from bs4 import BeautifulSoup
import requests, lxml

headers = {
    'User-agent':
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}

html = requests.get('https://www.google.de/search?q=define apple', headers=headers)
soup = BeautifulSoup(html.text, 'lxml')

syllables = soup.select_one('.frCXef span').text
phonetic = soup.select_one('.g30o5d span span').text
noun = soup.select_one('.h3TRxf span').text
print(f'{syllables}\n{phonetic}\n{noun}')

# Output:
'''
ap·ple
ˈapəl
the round fruit of a tree of the rose family, which typically has thin red or green skin and crisp flesh. Many varieties have been developed as dessert or cooking fruit or for making cider.
'''

或者,您可以使用来自 SerpApi 的 Google Direct Answer Box API 来做同样的事情。这是一个付费 API,可免费试用 5,000 次搜索。

要集成的代码:

from serpapi import GoogleSearch

params = {
  "api_key": "YOUR_API_KEY",
  "engine": "google",
  "q": "define apple",
  "google_domain": "google.com",
}

search = GoogleSearch(params)
results = search.get_dict()

syllables = results['answer_box']['syllables']
phonetic = results['answer_box']['phonetic']
noun = results['answer_box']['definitions'][0] # array output
print(f'{syllables}\n{phonetic}\n{noun}')

# Output:
'''
ap·ple
ˈapəl
the round fruit of a tree of the rose family, which typically has thin red or green skin and crisp flesh. Many varieties have been developed as dessert or cooking fruit or for making cider.
'''
<块引用>

免责声明,我为 SerpApi 工作