Python |网络抓取用户评论

时间:2020-05-27 19:25:04

标签: python selenium web-scraping

我正在采取网络爬虫的第一步,但可能我选择了一个“难题”问题

我正在尝试在此网站上下载用户评分(5星,4星,一组)https://www.influenster.com/,但是我有个审阅日期。

from selenium import webdriver

browser = webdriver.Chrome("C:/chromedriver_win32/chromedriver.exe")
browser.get("https://www.influenster.com/reviews/ferrero-rocher-chocolate")

rating = browser.find_elements_by_css_selector(".hXWPE.ixyxcj")
rating_text = [t.text for t in rating]
rating_text 

输出

['April 11th 2020, 9:49 pm',
 'April 16th 2020, 9:30 pm',
 'March 16th 2020, 10:32 am',
 'December 22nd 2019, 2:13 am',
 'March 31st 2020, 2:02 am',
 'April 11th 2020, 7:10 pm',
 'April 12th 2020, 2:13 pm',
 'May 15th 2020, 6:00 am',
 'April 13th 2020, 7:39 pm',
 'January 23rd 2020, 4:02 pm']

你能帮我吗? 谢谢

1 个答案:

答案 0 :(得分:1)

日期还满足css选择器的要求,因为它们位于注释之前,因此它们首先被返回。

尝试以下操作:

rating = driver.find_elements_by_class_name("review-text")

对于作者和明星...

ratings = driver.find_elements_by_css_selector(".ixyxcj>.fNRjgH")
for rating in ratings:
    author = rating.find_element_by_css_selector('.author-card .name').text
    stars = rating.find_elements_by_css_selector('.ixyxcj .productComponents__SingleStar-sc-1ffpes9-3.kdXCBs')
    print('The author: ' + author + ' gave ' + str(len(stars)) + ' stars')