我正在采取网络爬虫的第一步,但可能我选择了一个“难题”问题
我正在尝试在此网站上下载用户评分(5星,4星,一组)https://www.influenster.com/,但是我有个审阅日期。
from selenium import webdriver
browser = webdriver.Chrome("C:/chromedriver_win32/chromedriver.exe")
browser.get("https://www.influenster.com/reviews/ferrero-rocher-chocolate")
rating = browser.find_elements_by_css_selector(".hXWPE.ixyxcj")
rating_text = [t.text for t in rating]
rating_text
输出
['April 11th 2020, 9:49 pm',
'April 16th 2020, 9:30 pm',
'March 16th 2020, 10:32 am',
'December 22nd 2019, 2:13 am',
'March 31st 2020, 2:02 am',
'April 11th 2020, 7:10 pm',
'April 12th 2020, 2:13 pm',
'May 15th 2020, 6:00 am',
'April 13th 2020, 7:39 pm',
'January 23rd 2020, 4:02 pm']
你能帮我吗? 谢谢
答案 0 :(得分:1)
日期还满足css选择器的要求,因为它们位于注释之前,因此它们首先被返回。
尝试以下操作:
rating = driver.find_elements_by_class_name("review-text")
对于作者和明星...
ratings = driver.find_elements_by_css_selector(".ixyxcj>.fNRjgH")
for rating in ratings:
author = rating.find_element_by_css_selector('.author-card .name').text
stars = rating.find_elements_by_css_selector('.ixyxcj .productComponents__SingleStar-sc-1ffpes9-3.kdXCBs')
print('The author: ' + author + ' gave ' + str(len(stars)) + ' stars')