使用Python抓取Google购物

时间:2017-05-09 16:06:11

标签: python web-scraping web-crawler

我需要抓取Google购物,例如此链接https://www.google.com/?gfe_rd=cr&ei=BtcRWeX_D8aAsAHDgZ2QAw#q=hooker+furniture+5183-75300&tbm=shop

但是在服务器的响应中我只收到没有项目的测试。甚至在谷歌浏览器的源代码查看器中我也看不到项目的详细信息。 什么请求会得到我所有项目的详细数据?

1 个答案:

答案 0 :(得分:1)

您可以通过以下方式实现:

  • 使用 beautifulsoup + requests 库。不需要 selenium,因为您需要的一切都在 HTML 源代码中。使用 Ctrl+U 查看它,然后再决定使用哪个工具来抓取它。另外,请确保您使用的是 user-agentListuser-agents
  • 使用来自 SerpApi 的第三方 Google Shopping Results API见文末)。

代码和full example

from bs4 import BeautifulSoup
import requests
import lxml
import json

headers = {  # <-- so the Google will treat your script as a "real" user browser.
    "User-Agent":
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}

response = requests.get(
  'https://www.google.com/search?q=minecraft+toys&tbm=shop',
  headers=headers).text

soup = BeautifulSoup(response, 'lxml')

data = []

for container in soup.findAll('div', class_='sh-dgr__content'):
  title = container.find('h4', class_='A2sOrd').text
  price = container.find('span', class_='a8Pemb').text
  supplier = container.find('div', class_='aULzUe IuHnof').text

  data.append({
    "Title": title,
    "Price": price,
    "Supplier": supplier,
  })

print(json.dumps(data, indent = 2, ensure_ascii = False))

部分输出:

[
  {
    "Title": "Lego Minecraft The Creeper Mine Building Set",
    "Price": "$63.99",
    "Supplier": "Walmart - Elevate Service Online"
  },
  {
    "Title": "LEGO Minecraft The Mountain Cave (21137)",
    "Price": "$139.95",
    "Supplier": "Game Yore"
  },
  {
    "Title": "Lego Minecraft The Nether Portal Set",
    "Price": "$92.36",
    "Supplier": "eBay - davesworkshop"
  },
  {
    "Title": "Lego Minecraft Toy, The Pig House",
    "Price": "$49.95",
    "Supplier": "Walmart - Sheen Empire"
  }
]

或者,您也可以使用 SerpApi:

from serpapi import GoogleSearch
import os 

params = {
  "engine": "google",
  "q": "minecraft toys",
  "tbm": "shop",
  "api_key": os.getenv("API_KEY"),
}

search = GoogleSearch(params)
results = search.get_dict()

for result in results['shopping_results']:
  print(f"Title: {result['title']}\nPrice: {result['price']}\nSupplier: {result['source']}\n")

部分输出:

Title: Lego Minecraft The Creeper Mine Building Set4.8104
Price: $79.99
Supplier: Target

Title: LEGO Minecraft The Mountain Cave (21137)4.732
Price: $139.95
Supplier: Game Yore

Title: Lego Minecraft The Nether Portal Set4.787More options
Price: $92.36
Supplier: eBay - davesworkshop

Title: Lego Minecraft Toy, The Pig House4.850
Price: $43.99 $49.99
Supplier: Best Buy

Title: Lego 21160 Minecraft The Illager Raid4.9203
Price: $47.99
Supplier: Target

Title: Minecraft Kids Craft-A-Block Figures Assortment
Price: $12.00
Supplier: Selfridges
<块引用>

免责声明,我为 SerpApi 工作。