用硒刮擦TripAdvisor

时间:2019-06-11 01:19:20

标签: python selenium

我正在尝试使用硒从TripAdvisor那里获取一些数据。我成功获得了评论,评分和一些日期,但是当我尝试获取文稿,评论者的位置和帮助投票时,我总是从页面上获得5次第一结果。其他数据则不会发生。

您可以在下面看到我的代码

import csv
import time
from selenium import webdriver

container = driver.find_elements_by_xpath("//div[@class='hotels-review-list-parts-SingleReview__reviewContainer--d54T4']")

num_page_items = len(container)
for j in range(num_page_items):
    # to save the data
    string = container[j].find_element_by_xpath(".//span[contains(@class, 'ui_bubble_rating bubble_')]").get_attribute("class")
    rating = string.split("_")
    review=container[j].find_element_by_xpath(".//q[@class='hotels-review-list-parts-ExpandableReview__reviewText--3oMkH']").text.replace("\n","")
    check_in=container[j].find_element_by_xpath(".//div[@class='hotels-review-list-parts-EventDate__event_date--CRXs4']").text.replace("\n","").replace("Date of stay: ","")
    name_remove=container[j].find_element_by_xpath(".//a[@class='ui_header_link social-member-event-MemberEventOnObjectBlock__member--35-jC']").text.replace("\n","")
    review_date=container[j].find_element_by_xpath(".//div[@class='social-member-event-MemberEventOnObjectBlock__event_type--3njyv']").text.replace(name_remove,"").replace(" wrote a review ","")

    #data location,contributions,helpful_vote may not be always available and we use if to check and give values

    if (check_exists_by_xpath('//div[@class="social-member-MemberHeaderStats__event_info--30wFs"]')):
        location=container[j].find_element_by_xpath('//span[@class="social-member-MemberHeaderStats__hometown_stat_item--231iN"]').text.replace("\n","")
        time.sleep(5)
    else:
        location=""

    if (check_exists_by_xpath('//div[@class="social-member-MemberHeaderStats__event_info--30wFs"]/span[3]')):
        contributions=container[j].find_element_by_xpath('//div[@class="social-member-MemberHeaderStats__event_info--30wFs"]/span[2]').text.replace("\n","").replace(" contributions","").replace(" helpful votes","").replace(" helpful vote","")
        helpfull_votes=container[j].find_element_by_xpath('//div[@class="social-member-MemberHeaderStats__event_info--30wFs"]/span[3]').text.replace("\n","").replace(" helpful votes","").replace(" helpful vote","")

    elif(check_exists_by_xpath('//div[@class="social-member-MemberHeaderStats__event_info--30wFs"]/span[2]')):
        contributions=container[j].find_element_by_xpath('//div[@class="social-member-MemberHeaderStats__event_info--30wFs"]/span[2]').text.replace("\n","").replace(" contributions","").replace(" helpful votes","").replace(" helpful vote","")
        helpfull_votes=""
rating  review_date stay    location            contributions   helpful_votes
50      9-Jun       Jun-19  Bucharest, Romania  1               23
50      8-Jun       Jun-19  Bucharest, Romania  1               23
50      6-Jun       Jun-19  Bucharest, Romania  1               23
50      4-Jun       May-19  Bucharest, Romania  1               23
50      May-19      May-19  Bucharest, Romania  1               23
50      May-19      May-19  Monaco              10              1
50      May-19      May-19  Monaco              10              1
50      May-19      May-19  Monaco              10              1
50      May-19      May-19  Monaco              10              1
50      May-19      May-19  Monaco              10              1
50      May-19      May-19  Limassol, Cyprus    4               2
50      May-19      May-19  Limassol, Cyprus    4               2

您可以看到staylocationcontributionshelpful_votes保持5次相同。

0 个答案:

没有答案