我正在尝试使用BS4 + selen来学习网页报废。网站链接为tripadvisor
评论文字有一个更多的SPAN,点击一下,使用AJAX将更多文本加载到同一个div中。
但是我的代码在“按更多”按钮单击“selenium”之前输出了评论文本。
如何使用selenium点击“更多”按钮进行报废
from selenium import webdriver
from bs4 import BeautifulSoup
def openUrl(link):
driver = webdriver.Firefox()
driver.get(link)
elem1 = driver.find_element_by_xpath("//span[@class='taLnk ulBlueLinks']")
elem1.click()
html_source = driver.page_source
driver.quit()
soup = BeautifulSoup(html_source, 'lxml')
foundDiv = soup.findAll("div", {"class": "review-container"})
for reviewContainer in foundDiv:
ratingText = reviewContainer.select_one(".partial_entry").text
print(ratingText)
openUrl("https://www.tripadvisor.in/Hotel_Review-g1010231-d1065009-Reviews-Radisson_Blu_Resort_Spa_Alibaug-Alibaug_Raigad_District_Maharashtra.html")
但BS4在不等待更多按钮点击的情况下报废数据。
请帮忙
答案 0 :(得分:0)
请参阅下面的WebDriverWait示例。
driver.get('https://www.tripadvisor.in/Hotel_Review-g1010231-d1065009-Reviews-Radisson_Blu_Resort_Spa_Alibaug-Alibaug_Raigad_District_Maharashtra.html')
moreButton = driver.find_element_by_css_selector("span.taLnk.ulBlueLinks")
moreButton.click()
wait = WebDriverWait(driver, 10)
element = wait.until(EC.invisibility_of_element_located((By.CSS_SELECTOR, "div[data-reviewid='493434022'] div.loadingShade")))
html_source = driver.page_source
print(html_source)