如何使用硒从网站上抓取评级和所有评论

时间:2021-06-13 03:46:03

标签: python pandas selenium web-scraping

我想抓取页面上的评分和所有评论。但找不到路径。

enter code here
import urllib.request
from bs4 import BeautifulSoup
import csv
import os
from selenium import webdriver
from selenium.webdriver.support.select import Select
from selenium.webdriver.common.keys import Keys
import pandas as pd
import time
chrome_path =r'C:/Users/91940/AppData/Local/Programs/Python/Python39/Scripts/chromedriver.exe'
driver = webdriver.Chrome(executable_path=chrome_path)
driver.implicitly_wait(10)
driver.get("https://www.lazada.sg/products/samsung-galaxy-watch3-bt-45mm-titanium-i1156462257- 
        s4537770883.html?search=1&freeshipping=1")
product_name = driver.find_element_by_xpath('//*[@id="module_product_title_1"]/div/div/h1')
print(product_name.text)
rating = driver.find_element_by_xpath("//span[@class='score-average']")
print(rate.text)
review = driver .find_element_by_xpath('//* 
         [@id="module_product_review"]/div/div/div[3]/div[1]/div[1]')
print(review.text)

2 个答案:

答案 0 :(得分:0)

也许你的路径有问题? (抱歉,我不在 Windows 上进行测试)。根据记忆,Windows 路径使用 \ 字符而不是 /。此外,您可能需要在驱动器路径 (C:\\) 后面加上两个反引号。

c:\\Users\91940\AppData\Local\...

答案 1 :(得分:0)

我相信 print(product_name.text) 正在正确执行,对吗?

driver.find_element_by_xpath("//span[@class='score-average']") 存在问题,我在 HTML 源代码中的任何地方都找不到 score-average

所以试试这个:

driver.find_element_by_css_selector("div.pdp-review-summary")
print(rate.text)

您可以尝试以下代码来获得评论

wait = WebDriverWait(driver, 10)
driver.get("https://www.lazada.sg/products/samsung-galaxy-watch3-bt-45mm-titanium-i1156462257- s4537770883.html?search=1&freeshipping=1")
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a[class$='pdp-review-summary__link']"))).click()
ActionChains(driver).move_to_element(wait.until(EC.visibility_of_element_located((By.XPATH, "//h2[contains(text(), 'Ratings & Reviews')]")))).perform()
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.item-content")))
for review in driver.find_elements(By.CSS_SELECTOR, "div.item-content"):
    print(review.get_attribute('innerHTML'))

进口:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains