在谷歌搜索中使用 python 请求

时间:2021-04-05 18:31:26

标签: python beautifulsoup python-requests google-search

我是 Python 新手。 在 PyCharm 中,我写了这段代码:

import requests
from bs4 import BeautifulSoup

response = requests.get(f"https://www.google.com/search?q=fitness+wear")
soup = BeautifulSoup(response.text, 'html.parser')
print(soup)

我得到的不是搜索结果的HTML,而是下一页的HTML enter image description here

我在 pythonanywhere.com 上的脚本中使用了相同的代码,它运行良好。我已经尝试了很多我找到的解决方案,但结果总是一样的,所以现在我坚持使用它。

3 个答案:

答案 0 :(得分:2)

我认为这应该有效:

import requests
from bs4 import BeautifulSoup

with requests.Session() as s:
    url = f"https://www.google.com/search?q=fitness+wear"
    headers = {
        "referer":"referer: https://www.google.com/",
        "user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36"
        }
    s.post(url, headers=headers)
    response = s.get(url, headers=headers)
    soup = BeautifulSoup(response.text, 'html.parser')
print(soup)

它使用请求会话和发布请求来创建任何初始 cookie(对此不完全确定),然后允许您抓取。

答案 1 :(得分:1)

如果您在浏览器中打开一个私人窗口并转到 google.com,您应该会看到相同的弹出窗口,提示您同意。这是因为您没有发送会话 cookie。

您有不同的选择来解决这个问题。 一种是直接发送您可以在网站上观察到的 cookie,如下所示:

import requests
cookies = {"CONSENT":"YES+shp.gws-20210330-0-RC1.de+FX+412", ...}

resp = request.get(f"https://www.google.com/search?q=fitness+wear",cookies=cookies)

@Dimitriy Kruglikov 使用的解决方案更简洁,使用会话是与网站保持持久会话的好方法。

答案 2 :(得分:0)

Google 不会阻止您,您仍然可以从 HTML 中提取数据。

使用 cookie 不是很方便,使用带有 post 和 get 请求的 session 会导致更大的流量。

您可以使用 decompose()extract() BS4 方法删除此弹出窗口:

  • annoying_popup.decompose() 将完全销毁它及其内容。 Documentation

  • annoying_popup.extract() 将创建另一个 html 树:一个以您用来解析文档的 BeautifulSoup 对象为根,另一个以提取的标签为根。 Documentation

之后,您可以刮除所需的一切,而无需将其移除。

看到我最近做的这个Organic Results extraction。它从 Google 搜索结果中抓取标题、摘要和链接。


或者,您可以使用来自 SerpApi 的 Google Search Engine Results API。查看Playground

代码和example in online IDE

from serpapi import GoogleSearch
import os

params = {
  "engine": "google",
  "q": "fus ro dah",
  "api_key": os.getenv("API_KEY"),
}

search = GoogleSearch(params)
results = search.get_dict()

for result in results['organic_results']:
  print(f"Title: {result['title']}\nSnippet: {result['snippet']}\nLink: {result['link']}\n")

输出:

Title: Skyrim - FUS RO DAH (Dovahkiin) HD - YouTube
Snippet: I looked around for a fan made track that included Fus Ro Dah, but the ones that I found were pretty bad - some ...
Link: https://www.youtube.com/watch?v=JblD-FN3tgs

Title: Unrelenting Force (Skyrim) | Elder Scrolls | Fandom
Snippet: If the general subtitles are turned on, it can be seen that the text for the Draugr's Unrelenting Force is misspelled: "Fus Rah Do" instead of the proper "Fus Ro Dah." ...
Link: https://elderscrolls.fandom.com/wiki/Unrelenting_Force_(Skyrim)

Title: Fus Ro Dah | Know Your Meme
Snippet: Origin. "Fus Ro Dah" are the words for the "unrelenting force" thu'um shout in the game Elder Scrolls V: Skyrim. After reaching the first town of ...
Link: https://knowyourmeme.com/memes/fus-ro-dah

Title: Fus ro dah - Urban Dictionary
Snippet: 1. A dragon shout used in The Elder Scrolls V: Skyrim. 2.An international term for oral sex given by a female. ex.1. The Dragonborn yelled "Fus ...
Link: https://www.urbandictionary.com/define.php?term=Fus%20ro%20dah

JSON 的一部分:

"organic_results": [
  {
    "position": 1,
    "title": "Unrelenting Force (Skyrim) | Elder Scrolls | Fandom",
    "link": "https://elderscrolls.fandom.com/wiki/Unrelenting_Force_(Skyrim)",
    "displayed_link": "https://elderscrolls.fandom.com › wiki › Unrelenting_F...",
    "snippet": "If the general subtitles are turned on, it can be seen that the text for the Draugr's Unrelenting Force is misspelled: \"Fus Rah Do\" instead of the proper \"Fus Ro Dah.\" ...",
    "sitelinks": {
      "inline": [
        {
          "title": "Location",
          "link": "https://elderscrolls.fandom.com/wiki/Unrelenting_Force_(Skyrim)#Location"
        },
        {
          "title": "Effect",
          "link": "https://elderscrolls.fandom.com/wiki/Unrelenting_Force_(Skyrim)#Effect"
        },
        {
          "title": "Usage",
          "link": "https://elderscrolls.fandom.com/wiki/Unrelenting_Force_(Skyrim)#Usage"
        },
        {
          "title": "Word Wall",
          "link": "https://elderscrolls.fandom.com/wiki/Unrelenting_Force_(Skyrim)#Word_Wall"
        }
      ]
    },
    "cached_page_link": "https://webcache.googleusercontent.com/search?q=cache:K3LEBjvPps0J:https://elderscrolls.fandom.com/wiki/Unrelenting_Force_(Skyrim)+&cd=17&hl=en&ct=clnk&gl=us"
  }
]
<块引用>

免责声明,我为 SerpApi 工作。