有没有一种方法可以更改通过请求库搜索Google时获得的搜索结果数量?

时间:2020-01-28 16:53:19

标签: python beautifulsoup python-requests

现在,我编写的脚本仅获得10个结果。我想将其增加到50个。

请问请求库有什么办法吗?我很抱歉将这些代码塞满了。当我最初编写它时,我不想在此方面进行协作,因此我省略了注释标签等。

我现在不知道要输入什么,但该网站称我的帖子大部分是代码,并要求我提供更多详细信息。这实际上是一个非常简单的问题。我没什么可以补充的。

任何人都知道如何设置参数以每页获取50个结果,而不是默认的10个吗?

这是我当前正在使用的代码:

    #Check Connection
    def connected_to_internet(url='http://www.google.com/', timeout=5):
        try:
            _ = requests.get(url, timeout=timeout)
        except requests.ConnectionError:
           print("No internet connection. Please connect to the internet and try again.")
           exit()
    connected_to_internet()
    def clearscreen():
        _ = call('clear')
    # desktop user-agent
    USER_AGENT = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:65.0) Gecko/20100101 Firefox/65.0"
    # mobile user-agent
    MOBILE_USER_AGENT = "Mozilla/5.0 (Linux; Android 7.0; SM-G930V Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.125 Mobile Safari/537.36"
    # Query user
    print("What is the article about?")
    query = input()
    done = False
    def animate():
        for c in itertools.cycle(['|','/','-','\\']):
            if done:
                break
            sys.stdout.write('\r'+c)
            sys.stdout.flush()
            time.sleep(0.1)
        sys.stdout.write('\r')

    t = threading.Thread(target=animate)
    t.start()
    print("")
    time.sleep(1)
    clearscreen()
    print("Searching Google for information about: ",query)
    squery = query.replace(' ', '+')
    URL = f"https://google.com/search?q={squery}"
    headers = {"user-agent": USER_AGENT}
    resp = requests.get(URL, headers=headers)
    #First-stage scrape of Google
    if resp.status_code == 200:
        soup = BeautifulSoup(resp.content, "html.parser")
        results = []
    #Grab all the URLs from the first page of SERPS
        for g in soup.find_all('div', class_='r'):
            anchors = g.find_all('a')
            if anchors:
                link = anchors[0]['href']
                title = g.find('h3').text
                item = {
                    link
                }
                results.append(item)
    #Create list of urls and format to enable second scrape
    listurls=str(results)
    listurls=listurls.replace("[","")
    listurls=listurls.replace("{","")
    listurls=listurls.replace("'}","")
    listurls=listurls.replace("'","")
    listurls=listurls.replace("]","")
    qresults=listurls.split(",")
    #Identify the number of URLS for later comparison
    numresults=len(qresults)

0 个答案:

没有答案