使用python在谷歌搜索

时间:2018-02-26 18:12:03

标签: python google-search

我正在寻找在谷歌(普通搜索)中搜索的任何python API,我只找到以下代码:

import pprint
from googleapiclient.discovery import build

    def google_search(search_term, api_key, cse_id, **kwargs):
        service = build("customsearch", "v1", developerKey=api_key)
        res = service.cse().list(q=search_term, cx=cse_id, **kwargs).execute()
        return res['items']

my_api_key = "Google_API_Key"
my_cse_id = "my_cse_id"
results = google_search('stackoverflow site:en.wikipedia.org', my_api_key, my_cse_id, num=10)
for result in results:
      pprint.pprint(result)

首先是它产生了这个错误,我试图解决但没有机会:/: 结果

  

= self.google_searchS('stackoverflow wikipedia.org',my_api_key,   my_cse_id)

     

TypeError:google_searchS()需要3个位置参数,但有4个参数

另一件事是普通搜索还有其他API“非自定义搜索”..?

2 个答案:

答案 0 :(得分:1)

在我的情况下,我使用python3.7。我还在search engine中启用了搜索整个网络enter image description here

并如下更改该代码:

from googleapiclient.discovery import build
import pprint

my_api_key = "ASDFASDF4TzB-FASDGDFG9n3Wfdsdffasd3PQ"
my_cse_id = "2199288337529387487289738:8asfiejkfjke"

def google_search(search_term, api_key, cse_id, **kwargs):
    service = build("customsearch", "v1", developerKey=api_key)
    res = service.cse().list(q=search_term, cx=cse_id, **kwargs).execute()
    return res['items']

results = google_search(
    'open world', my_api_key, my_cse_id, num=10)
for result in results:
    pprint.pprint(result)

它输出10个这样的结果: enter image description here

注意:发布之前,我已经混淆了我的api_key和cse_id。另外,我已经按照here的说明配置了venv。

答案 1 :(得分:0)

我找到了解决方案,问题出在课堂上调用..无论如何,它没有帮助我,为什么? Google自定义搜索API仅用于在网站中添加搜索选项,并且应该进行自定义...也不会像普通搜索一样检索结果。

但要使用"正常"谷歌搜索引擎,我得到了这个代码:

def searchGo(self,search_term, number_results, language_code):
        USER_AGENT = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}
        assert isinstance(search_term, str), 'Search term must be a string'
        assert isinstance(number_results, int), 'Number of results must be an integer'
        escaped_search_term = search_term.replace(' ', '+')
        google_url = 'https://www.google.com/search?q={}&num={}&hl={}'.format(escaped_search_term, number_results, language_code)
        response = requests.get(google_url, headers=USER_AGENT)
        response.raise_for_status()
        # Parsing
        soup = BeautifulSoup(response.text, 'html.parser')
        found_results = []
        rank = 1
        result_block = soup.find_all('div', attrs={'class': 'g'})
        for result in result_block:
            link = result.find('a', href=True)
            title = result.find('h3', attrs={'class': 'r'})
            description = result.find('span', attrs={'class': 'st'})
            if link and title:
                link = link['href']
                title = title.get_text()
                if description:
                    description = description.get_text()
                if link != '#':
                    found_results.append({'link': link,'keyword': search_term, 'rank': rank, 'title': title, 'description': description})
                    rank += 1
        return found_results

你可以这样称呼它:

str1 = ts.searchGo('obama spent two millions for Egyption', 10, 'en')
    for i in range(len(str1)): print(str(str1[i]['rank']) + "\n" + str(str1[i]['title']) + "\n" + str(str1[i]['description']) + "\n")