Selenium Python在获取Google评论时无法向下滚动

时间:2018-12-12 19:21:39

标签: python selenium web-scraping

我正试图借助python中的硒来获取Google评论。我已经从硒python模块导入了webdriver。然后,我按如下方式初始化了self.driver:-

self.driver = webdriver.Chrome(executable_path="./chromedriver.exe",chrome_options=webdriver.ChromeOptions())

此后,我使用下面的代码在需要评论的Google主页上键入公司名称,现在,我试图获取有关“ STANLEY BRIDGE CYCLES AND SPORTS LIMITED”的评论:-

company_name = self.driver.find_element_by_name("q")
company_name.send_keys("STANLEY BRIDGE CYCLES AND SPORTS LIMITED ")
time.sleep(2)

之后,使用以下代码单击google搜索按钮:-

self.driver.find_element_by_name("btnK").click()
time.sleep(2)

然后,我终于在页面上可以看到结果了。现在,我想单击“在Google评论中查看”按钮。为此,请使用以下代码:-

self.driver.find_elements_by_link_text("View all Google reviews")[0].click()
time.sleep(2)

现在,我能够获得评论,但只有10条。我至少需要20条评论才能对公司进行评论。为此,我尝试使用以下代码向下滚动页面: self.driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") time.sleep(5)

即使使用上面的代码向下滚动页面,我仍然只有10条评论。我没有收到任何错误。

需要有关如何向下滚动页面以获得至少20条评论的帮助。截至目前,我仅能获得10条评论。根据我对该问题的在线搜索,人们经常使用:“ driver.execute_script(” window.scrollTo(0,document.body.scrollHeight);“)”在需要时向下滚动页面。但是对我来说这是行不通的。我检查了页面的前后高度(“ driver.execute_script(” window.scrollTo(0,document.body.scrollHeight);“)”)是相同的。

3 个答案:

答案 0 :(得分:1)

使用Javascript滚动到上一个评论,这将触发额外的评论负载。

last_review = self.driver.find_element_by_css_selector('div.gws-localreviews__google-review:last-of-type')
self.driver.execute_script('arguments[0].scrollIntoView(true);', last_review)

编辑:

以下示例在Firefox和Chrome上对我来说正常工作,您可以根据需要重复使用Google提取摘要功能

import time

from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait


def extract_google_reviews(driver, query):
    driver.get('https://www.google.com/?hl=en')
    driver.find_element_by_name('q').send_keys(query)
    WebDriverWait(driver, 5).until(EC.element_to_be_clickable((By.NAME, 'btnK'))).click()

    reviews_header = driver.find_element_by_css_selector('div.kp-header')
    reviews_link = reviews_header.find_element_by_partial_link_text('Google reviews')
    number_of_reviews = int(reviews_link.text.split()[0])
    reviews_link.click()

    all_reviews = WebDriverWait(driver, 3).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, 'div.gws-localreviews__google-review')))
    while len(all_reviews) < number_of_reviews:
        driver.execute_script('arguments[0].scrollIntoView(true);', all_reviews[-1])
        WebDriverWait(driver, 5, 0.25).until_not(EC.presence_of_element_located((By.CSS_SELECTOR, 'div[class$="activityIndicator"]')))
        all_reviews = driver.find_elements_by_css_selector('div.gws-localreviews__google-review')

    reviews = []
    for review in all_reviews:
        try:
            full_text_element = review.find_element_by_css_selector('span.review-full-text')
        except NoSuchElementException:
            full_text_element = review.find_element_by_css_selector('span[class^="r-"]')
        reviews.append(full_text_element.get_attribute('textContent'))

    return reviews

if __name__ == '__main__':
    try:
        driver = webdriver.Firefox()
        reviews = extract_google_reviews(driver, 'STANLEY BRIDGE CYCLES AND SPORTS LIMITED')
    finally:
        driver.quit()

    print(reviews)

答案 1 :(得分:0)

lenOfPage = driver.execute_script('window.scrollTo(0, [hard code the height])')

对我来说,如果我一次又一次地对同一页面使用自动测试功能,我会严格要求高度。

或者您可以让它连续循环以向下滚动页面,直到找到该元素为止。

答案 2 :(得分:0)

或者,您也可以在没有浏览器自动化的情况下获得所有评论。

您唯一需要的是 data_fid,您可以在搜索地点的页面源中找到它。

enter image description here

在这种情况下:0x48762038283b0bc3:0xc373b8d4227d0090

之后,您只需向以下地址发出请求:https://www.google.com/async/reviewDialog?hl=en&async=feature_id:0x48762038283b0bc3:0xc373b8d4227d0090,sort_by:,next_page_token:,associated_topic:,_fmt:pc

您会在那里找到所有评论数据以及 next_page_token,以便您查询接下来的 10 条评论。

在这种情况下,next_page_token 是:EgIICg

因此,接下来 10 条评论的请求 URL 将是:https://www.google.com/async/reviewDialog?hl=en&async=feature_id:0x48762038283b0bc3:0xc373b8d4227d0090,sort_by:,next_page_token:EgIICg,associated_topic:,_fmt:pc

您也可以使用第三方解决方案,例如 SerpApi。这是一个免费试用的付费 API。我们为您处理代理、解析验证码并解析所有丰富的结构化数据。

示例 Python 代码(也可在其他库中使用):

from serpapi import GoogleSearch

params = {
  "api_key": "secret_api_key",
  "engine": "google_maps_reviews",
  "hl": "en",
  "data_id": "0x48762038283b0bc3:0xc373b8d4227d0090",
}

search = GoogleSearch(params)
results = search.get_dict()

示例 JSON 输出:

"place_info": {
  "title": "Stanley Bridge Cycles & Sports Ltd",
  "address": "Newnham Parade, 11 College Rd, Cheshunt, Waltham Cross, United Kingdom",
  "rating": 5,
  "reviews": 53
},
"reviews": [
  {
    "user": {
      "name": "Armilson Correia",
      "link": "https://www.google.com/maps/contrib/102797076683495103766?hl=en-US&sa=X&ved=2ahUKEwja2tvQj-DxAhUHMVkFHcJuD_MQvvQBegQIARAh",
      "thumbnail": "https://lh3.googleusercontent.com/a-/AOh14GgCCH69E_qgfu3pa1xbTsyvH9ORn8PEonb5FcubKg=s40-c-c0x00000000-cc-rp-mo-ba3-br100",
      "local_guide": true,
      "reviews": 48,
      "photos": 9
    },
    "rating": 5,
    "date": "2 days ago",
    "snippet": "In my opinion The best bike shop In radios of 60 miles Very professional and excellent customer service My bike come out from there riding like a New ,no Words just perfect"
  },
  {
    "user": {
      "name": "John Janes",
      "link": "https://www.google.com/maps/contrib/104286744244406721398?hl=en-US&sa=X&ved=2ahUKEwja2tvQj-DxAhUHMVkFHcJuD_MQvvQBegQIARAt",
      "thumbnail": "https://lh3.googleusercontent.com/a/AATXAJzRZRQx74RYqpNQArE0ER-d24iQ-3kAwK64-46u=s40-c-c0x00000000-cc-rp-mo-br100",
      "reviews": 2,
      "photos": 1
    },
    "rating": 5,
    "date": "a year ago",
    "snippet": "The guys recently built my new bike and the advice on components to use was invaluable. Even the wheels were built from scratch. A knowledgeable efficient team with great attention to detail. I wouldn't go anywhere else .",
    "likes": 1,
    "images": [
      "https://lh5.googleusercontent.com/p/AF1QipMc5u1rIZ88w-cfeAeF2s6bSndHMhLw8YC_BllS=w100-h100-p-n-k-no"
    ]
  },
  {
    "user": {
      "name": "James Wainwright",
      "link": "https://www.google.com/maps/contrib/116302076794615919905?hl=en-US&sa=X&ved=2ahUKEwja2tvQj-DxAhUHMVkFHcJuD_MQvvQBegQIARA6",
      "thumbnail": "https://lh3.googleusercontent.com/a/AATXAJwx8OTba1pQ9lrzxy7LU5BnrJYWu90METBaK68F=s40-c-c0x00000000-cc-rp-mo-br100",
      "reviews": 36,
      "photos": 7
    },
    "rating": 5,
    "date": "a month ago",
    "snippet": "Want to thank the guys for giving my bike the full service it needed .Its now like new again and I didn't realise how much had worn out.Recomend to anyone in the cheshunt area."
  },
  ...
]

查看documentation了解更多详情。

免责声明:我在 SerpApi 工作。