如何使用python刮刮谷歌地图

时间:2017-12-29 12:24:40

标签: python html web-scraping beautifulsoup scrapy

我正在尝试使用python从谷歌地图中删除一个地方的评论数量。例如Pike's Landing餐厅(见下面的谷歌地图网址)有162条评论。我想在python中提取这个数字。

网址:https://www.google.com/maps?cid=15423079754231040967

我并不熟悉HTML,但是从互联网上的一些基本示例中我编写了以下代码,但我得到的是运行此代码后的黑色变量。如果你能让我知道我在这里错了什么,我将不胜感激。

from urllib.request import urlopen
from bs4 import BeautifulSoup

quote_page ='https://www.google.com/maps?cid=15423079754231040967'
page = urlopen(quote_page)
soup = BeautifulSoup(page, 'html.parser')
price_box = soup.find_all('button',attrs={'class':'widget-pane-link'})
print(price_box.text)

2 个答案:

答案 0 :(得分:0)

在没有API的情况下,使用纯Python很难做到,这就是我的结尾(请注意,我在URL的末尾添加了&hl=en,以获得英语结果而不是我的语言): / p>

import re
import requests
from ast import literal_eval

urls = [
'https://www.google.com/maps?cid=15423079754231040967&hl=en',
'https://www.google.com/maps?cid=16168151796978303235&hl=en']

for url in urls:
    for g in re.findall(r'\[\\"http.*?\d+ reviews?.*?]', requests.get(url).text):
        data = literal_eval(g.replace('null', 'None').replace('\\"', '"'))
        print(bytes(data[0], 'utf-8').decode('unicode_escape'))
        print(data[1])

打印:

http://www.google.com/search?q=Pike's+Landing,+4438+Airport+Way,+Fairbanks,+AK+99709,+USA&ludocid=15423079754231040967#lrd=0x51325b1733fa71bf:0xd609c9524d75cbc7,1
469 reviews
http://www.google.com/search?q=Sequoia+TreeScape,+Newmarket,+ON+L3Y+8R5,+Canada&ludocid=16168151796978303235#lrd=0x882ad2157062b6c3:0xe060d065957c4103,1
42 reviews

答案 1 :(得分:0)

您需要查看页面的源代码并解析 window.APP_INITIALIZATION_STATE 变量块,在那里您会找到所有需要的数据。


或者,您可以使用来自 SerpApi 的 Google Maps Reviews API

示例 JSON 输出:

"place_results": {
  "title": "Pike's Landing",
  "data_id": "0x51325b1733fa71bf:0xd609c9524d75cbc7",
  "reviews_link": "https://serpapi.com/search.json?engine=google_maps_reviews&hl=en&place_id=0x51325b1733fa71bf%3A0xd609c9524d75cbc7",
  "gps_coordinates": {
    "latitude": 64.8299557,
    "longitude": -147.8488774
  },
  "place_id_search": "https://serpapi.com/search.json?data=%214m5%213m4%211s0x51325b1733fa71bf%3A0xd609c9524d75cbc7%218m2%213d64.8299557%214d-147.8488774&engine=google_maps&google_domain=google.com&hl=en&type=place",
  "thumbnail": "https://lh5.googleusercontent.com/p/AF1QipNtwheOCQ97QFrUNIwKYUoAPiV81rpiW5cIiQco=w152-h86-k-no",
  "rating": 3.9,
  "reviews": 839,
  "price": "$$",
  "type": [
    "American restaurant"
  ],
  "description": "Burgers, seafood, steak & river views. Pub fare alongside steak & seafood, served in a dining room with river views & a waterfront patio.",
  "service_options": {
    "dine_in": true,
    "curbside_pickup": true,
    "delivery": false
  }
}

要集成的代码:

import os
from serpapi import GoogleSearch

params = {
    "engine": "google_maps",
    "type": "search",
    "q": "pike's landing",
    "ll": "@40.7455096,-74.0083012,14z",
    "google_domain": "google.com",
    "api_key": os.getenv("API_KEY"),
}

search = GoogleSearch(params)
results = search.get_dict()

reviews = results["place_results"]["reviews"]

print(reviews)

输出:

839
<块引用>

免责声明,我为 SerpApi 工作。