获取150个Google搜索结果的Python脚本

时间:2015-09-25 09:31:38

标签: python

我需要在python的帮助下获得前15页谷歌搜索结果。我尝试了这个答案Extract Google Search Results。但我没有得到先前的结果。我需要150个搜索结果,与python的原始链接。如果有人知道,请给我解决方案。提前谢谢。

2 个答案:

答案 0 :(得分:0)

我通过这种方式获得了150个搜索结果:

import sys # Used to add the BeautifulSoup folder the import path
import urllib2 # Used to read the html document

if __name__ == "__main__":
    ### Import Beautiful Soup
    ### Here, I have the BeautifulSoup folder in the level of this Python script
    ### So I need to tell Python where to look.
    sys.path.append("./BeautifulSoup")
    from BeautifulSoup import BeautifulSoup

    ### Create opener with Google-friendly user agent
    opener = urllib2.build_opener()
    opener.addheaders = [('User-agent', 'Mozilla/5.0')]

    ### Open page & generate soup
    ### the "start" variable will be used to iterate through 10 pages.
    for start in range(0,15):
        url = "http://www.google.com/search?q=site:stackoverflow.com&start=" + str(start*10)
        page = opener.open(url)
        soup = BeautifulSoup(page)

        ### Parse and find
        ### Looks like google contains URLs in <cite> tags.
        ### So for each cite tag on each page (10), print its contents (url)
        for cite in soup.findAll('cite'):
            print cite.text

您只需在BeautifulSoup之前安装pip install BeautifulSoup

代码来自您引用的链接:Extract Google Search Results

答案 1 :(得分:0)

或者,您可以使用SERP API回购

Python wrapper

说明很简单:

pip install google-search-results

,用法是:

from lib.google_search_results import GoogleSearchResults
query = GoogleSearchResults({"q": "coffee"})
html_results = query.get_html()

更高级的用途是在SERP API Github上。