Beautifulsoup scaping谷歌评论(谷歌地方)

时间:2016-12-13 04:02:18

标签: html python-3.x beautifulsoup

我试图从Google地方评论中搜集用户评论(该API仅返回5条最有帮助的评论)。我试图使用Beautifulsoup来检索4条信息 1)审稿人的姓名 2)撰写评论时 3)评级(满分5分) 4)审查机构

检查每个元素我可以找到信息的位置

1)评论者姓名:

<a class="_e8k" style="color:black;text-decoration:none" href="https://www.google.com/maps/contrib/103603482673238284204/reviews">Steve Fox</a>

2)撰写评论时

<span style="color:#999;font-size:13px">3 months ago</span>

3)评级(在代码中可见,但不显示&#34;运行代码段&#34;

<span class="_pxg _Jxg" aria-label="Rated 1.0 out of 5,"><span style="width:14px"></span></span>

4)评论正文

<span jsl="$t t-uvHqeLvCkgA;$x 0;" class="r-i8GVQS_tBTbg">Don't go near this company.  Must be the world's worst ISP.  Threatened to set debt collection services on me when I refused to pay for a service that they had cut off through  competence.  They even spitefully  managed to apply block on our internet connection after we moved to a new Isp.  I hate this company.</span>

我正在努力探讨如何在HTML中引用信息的位置。我看到最后3条信息是跨越的,所以我尝试了以下 - 但没有返回相关信息

import bs4 as bs
import urllib.request
sauce = urllib.request.urlopen('https://www.google.co.nz/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=orcon&lrd=0x6d0d3833fefacf95:0x59fef608692d4541,1,').read()
soup = bs.BeautifulSoup(sauce, 'lxml')
attempt1 = soup.find_all('span class')
for span in attempt1:
    print(span)

我认为我没有正确/准确地引用HTML中的4条信息。有人可以指出出了什么问题吗?关心史蒂夫

1 个答案:

答案 0 :(得分:0)

要抓取某个地点的评论,您需要该地点 ID。看起来像这样0x89c259a61c75684f:0x79d31adb123348d2

然后您需要使用包含 place_id 的空闲 url 发出请求:

https://www.google.com/async/reviewDialog?hl=en&async=feature_id:0x89c259a61c75684f:0x79d31adb123348d2,sort_by:,next_page_token:,associated_topic:,_fmt:pc

或者,您可以使用第三方解决方案,例如 SerpApi。这是一个免费试用的付费 API。我们为您处理代理、解析验证码并解析所有丰富的结构化数据。

示例 Python 代码(也可在其他库中使用):

from serpapi import GoogleSearch

params = {
  "engine": "google_maps_reviews",
  "place_id": "0x89c259a61c75684f:0x79d31adb123348d2",
  "hl": "en",
  "api_key": "secret_api_key"
}

search = GoogleSearch(params)
results = search.get_dict()

示例 JSON 输出:

"reviews": [
  {
    "user": {
      "name": "HerbertTomlinson O",
      "link": "https://www.google.com/maps/contrib/100851257830988379503?hl=en-US&sa=X&ved=2ahUKEwiIlNzLtJrxAhVFWs0KHfclCwAQvvQBegQIARAy",
      "thumbnail": "https://lh3.googleusercontent.com/a/AATXAJyjD5T8NEJSdOUAveA8IuMDTLXE9edBHDpFTvZ8=s40-c-c0x00000000-cc-rp-mo-br100",
      "reviews": 2
    },
    "rating": 4,
    "date": "2 months ago",
    "snippet": "Finally, I found the best coffee shop today. Their choice of music is usually blasting from the past which was really relaxing and made me stay longer. There are tables for lovers and also for group of friends. The coffees and foods here are very affordable and well worth the money. You can't go wrong with this coffee shop. This is very worth to visit."
  },
  {
    "user": {
      "name": "Izaac Collier",
      "link": "https://www.google.com/maps/contrib/116734781291082397423?hl=en-US&sa=X&ved=2ahUKEwiIlNzLtJrxAhVFWs0KHfclCwAQvvQBegQIARA-",
      "thumbnail": "https://lh3.googleusercontent.com/a-/AOh14GgfhltPhiWrkTwe6swLUQRCWf_asuTfHPRnJCLc=s40-c-c0x00000000-cc-rp-mo-br100",
      "reviews": 2
    },
    "rating": 5,
    "date": "a month ago",
    "snippet": "I am not into coffee but one of my friends invited me here. As I looked the menu, I was convinced, so I ordered one for me. The food was tasty and the staff were very friendly and accommodating. The ambience was very cosy and comfortable. The coffee was great and super tasty. I will recommend this and will visit again!"
  },
  ...

查看documentation了解更多详情。

免责声明:我在 SerpApi 工作。