我正在努力使用python从tripadvisor抓取评论部分的用户的年龄和访问城市的数量。请看照片。
请使用链接:https://www.tripadvisor.com.au/Hotel_Review-g56003-d266157-Reviews-Magnolia_Hotel_Houston-Houston_Texas.html 并非所有用户都提供了他们的年龄段。 这是到目前为止我一直在研究的代码。
from selenium import webdriver
import time
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url ='https://www.tripadvisor.com.au/Hotel_Review-g56003-d266157-Reviews-Magnolia_Hotel_Houston-Houston_Texas.html'
info = browser.find_element_by_class_name('memberOverlayLink')
users_info = WebDriverWait(browser, 1).until_not(EC.visibility_of_element_located((By.ID, "memberOverlayLink")))
info.click()
for photo in reviewBox.find_elements_by_class_name('innerContent'):
age_group = ''
try:
age = photo.find_element_by_class_name("memberdescriptionReviewEnhancements li+ li")
age_group = age.text
age_group = ' '.join(age_group.split()[0:1])
except NoSuchElementException:
age_group = ''
cities_visited = ''
try:
visit = photo.find_element_by_class_name("badgeTextReviewEnhancements")
cities_visited = visit.text
cities_visited = ' '.join(age_group.split()[0:1])
except NoSuchElementException:
cities_visited = ''
close = reviewBox.find_element_by_class_name("ui_close_x")
close.click()
请任何人知道如何刮擦它们,请指导我。谢谢!。