我正试图借助python中的硒来获取Google评论。我已经从硒python模块导入了webdriver。然后,我按如下方式初始化了self.driver:-
self.driver = webdriver.Chrome(executable_path="./chromedriver.exe",chrome_options=webdriver.ChromeOptions())
此后,我使用下面的代码在需要评论的Google主页上键入公司名称,现在,我试图获取有关“ STANLEY BRIDGE CYCLES AND SPORTS LIMITED”的评论:-
company_name = self.driver.find_element_by_name("q")
company_name.send_keys("STANLEY BRIDGE CYCLES AND SPORTS LIMITED ")
time.sleep(2)
之后,使用以下代码单击google搜索按钮:-
self.driver.find_element_by_name("btnK").click()
time.sleep(2)
然后,我终于在页面上可以看到结果了。现在,我想单击“在Google评论中查看”按钮。为此,请使用以下代码:-
self.driver.find_elements_by_link_text("View all Google reviews")[0].click()
time.sleep(2)
现在,我能够获得评论,但只有10条。我至少需要20条评论才能对公司进行评论。为此,我尝试使用以下代码向下滚动页面:
self.driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(5)
即使使用上面的代码向下滚动页面,我仍然只有10条评论。我没有收到任何错误。
需要有关如何向下滚动页面以获得至少20条评论的帮助。截至目前,我仅能获得10条评论。根据我对该问题的在线搜索,人们经常使用:“ driver.execute_script(” window.scrollTo(0,document.body.scrollHeight);“)”在需要时向下滚动页面。但是对我来说这是行不通的。我检查了页面的前后高度(“ driver.execute_script(” window.scrollTo(0,document.body.scrollHeight);“)”)是相同的。
答案 0 :(得分:1)
使用Javascript滚动到上一个评论,这将触发额外的评论负载。
last_review = self.driver.find_element_by_css_selector('div.gws-localreviews__google-review:last-of-type')
self.driver.execute_script('arguments[0].scrollIntoView(true);', last_review)
编辑:
以下示例在Firefox和Chrome上对我来说正常工作,您可以根据需要重复使用Google提取摘要功能
import time
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
def extract_google_reviews(driver, query):
driver.get('https://www.google.com/?hl=en')
driver.find_element_by_name('q').send_keys(query)
WebDriverWait(driver, 5).until(EC.element_to_be_clickable((By.NAME, 'btnK'))).click()
reviews_header = driver.find_element_by_css_selector('div.kp-header')
reviews_link = reviews_header.find_element_by_partial_link_text('Google reviews')
number_of_reviews = int(reviews_link.text.split()[0])
reviews_link.click()
all_reviews = WebDriverWait(driver, 3).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, 'div.gws-localreviews__google-review')))
while len(all_reviews) < number_of_reviews:
driver.execute_script('arguments[0].scrollIntoView(true);', all_reviews[-1])
WebDriverWait(driver, 5, 0.25).until_not(EC.presence_of_element_located((By.CSS_SELECTOR, 'div[class$="activityIndicator"]')))
all_reviews = driver.find_elements_by_css_selector('div.gws-localreviews__google-review')
reviews = []
for review in all_reviews:
try:
full_text_element = review.find_element_by_css_selector('span.review-full-text')
except NoSuchElementException:
full_text_element = review.find_element_by_css_selector('span[class^="r-"]')
reviews.append(full_text_element.get_attribute('textContent'))
return reviews
if __name__ == '__main__':
try:
driver = webdriver.Firefox()
reviews = extract_google_reviews(driver, 'STANLEY BRIDGE CYCLES AND SPORTS LIMITED')
finally:
driver.quit()
print(reviews)
答案 1 :(得分:0)
lenOfPage = driver.execute_script('window.scrollTo(0, [hard code the height])')
对我来说,如果我一次又一次地对同一页面使用自动测试功能,我会严格要求高度。
或者您可以让它连续循环以向下滚动页面,直到找到该元素为止。
答案 2 :(得分:0)
或者,您也可以在没有浏览器自动化的情况下获得所有评论。
您唯一需要的是 data_fid
,您可以在搜索地点的页面源中找到它。
在这种情况下:0x48762038283b0bc3:0xc373b8d4227d0090
您会在那里找到所有评论数据以及 next_page_token
,以便您查询接下来的 10 条评论。
在这种情况下,next_page_token
是:EgIICg
因此,接下来 10 条评论的请求 URL 将是:https://www.google.com/async/reviewDialog?hl=en&async=feature_id:0x48762038283b0bc3:0xc373b8d4227d0090,sort_by:,next_page_token:EgIICg,associated_topic:,_fmt:pc
您也可以使用第三方解决方案,例如 SerpApi。这是一个免费试用的付费 API。我们为您处理代理、解析验证码并解析所有丰富的结构化数据。
示例 Python 代码(也可在其他库中使用):
from serpapi import GoogleSearch
params = {
"api_key": "secret_api_key",
"engine": "google_maps_reviews",
"hl": "en",
"data_id": "0x48762038283b0bc3:0xc373b8d4227d0090",
}
search = GoogleSearch(params)
results = search.get_dict()
示例 JSON 输出:
"place_info": {
"title": "Stanley Bridge Cycles & Sports Ltd",
"address": "Newnham Parade, 11 College Rd, Cheshunt, Waltham Cross, United Kingdom",
"rating": 5,
"reviews": 53
},
"reviews": [
{
"user": {
"name": "Armilson Correia",
"link": "https://www.google.com/maps/contrib/102797076683495103766?hl=en-US&sa=X&ved=2ahUKEwja2tvQj-DxAhUHMVkFHcJuD_MQvvQBegQIARAh",
"thumbnail": "https://lh3.googleusercontent.com/a-/AOh14GgCCH69E_qgfu3pa1xbTsyvH9ORn8PEonb5FcubKg=s40-c-c0x00000000-cc-rp-mo-ba3-br100",
"local_guide": true,
"reviews": 48,
"photos": 9
},
"rating": 5,
"date": "2 days ago",
"snippet": "In my opinion The best bike shop In radios of 60 miles Very professional and excellent customer service My bike come out from there riding like a New ,no Words just perfect"
},
{
"user": {
"name": "John Janes",
"link": "https://www.google.com/maps/contrib/104286744244406721398?hl=en-US&sa=X&ved=2ahUKEwja2tvQj-DxAhUHMVkFHcJuD_MQvvQBegQIARAt",
"thumbnail": "https://lh3.googleusercontent.com/a/AATXAJzRZRQx74RYqpNQArE0ER-d24iQ-3kAwK64-46u=s40-c-c0x00000000-cc-rp-mo-br100",
"reviews": 2,
"photos": 1
},
"rating": 5,
"date": "a year ago",
"snippet": "The guys recently built my new bike and the advice on components to use was invaluable. Even the wheels were built from scratch. A knowledgeable efficient team with great attention to detail. I wouldn't go anywhere else .",
"likes": 1,
"images": [
"https://lh5.googleusercontent.com/p/AF1QipMc5u1rIZ88w-cfeAeF2s6bSndHMhLw8YC_BllS=w100-h100-p-n-k-no"
]
},
{
"user": {
"name": "James Wainwright",
"link": "https://www.google.com/maps/contrib/116302076794615919905?hl=en-US&sa=X&ved=2ahUKEwja2tvQj-DxAhUHMVkFHcJuD_MQvvQBegQIARA6",
"thumbnail": "https://lh3.googleusercontent.com/a/AATXAJwx8OTba1pQ9lrzxy7LU5BnrJYWu90METBaK68F=s40-c-c0x00000000-cc-rp-mo-br100",
"reviews": 36,
"photos": 7
},
"rating": 5,
"date": "a month ago",
"snippet": "Want to thank the guys for giving my bike the full service it needed .Its now like new again and I didn't realise how much had worn out.Recomend to anyone in the cheshunt area."
},
...
]
查看documentation了解更多详情。
免责声明:我在 SerpApi 工作。