试图使用BeautifulSoup刮掉tripadvisor成员

时间:2016-08-05 11:39:08

标签: python web-scraping beautifulsoup bs4

所以我试图抓住这个用户个人资料,以了解他对酒店的评价。餐厅分开 https://www.tripadvisor.in/members-reviews/rahuls896

现在的问题是,当我通过BeautiFulsoup阅读时,默认显示我的所有评论。因此,默认情况下, class =“active”会分配给“REVIEWS_ALL”

<li data-filter="REVIEWS_ALL" class="active">All</li>
<li data-filter="REVIEWS_HOTELS">Hotels (1)</li>
<li data-filter="REVIEWS_RESTAURANTS">Restaurants (1)</li>

但我希望将 class =“active”分配给“REVIEWS_HOTELS”

<li data-filter="REVIEWS_ALL">All</li>
<li data-filter="REVIEWS_HOTELS" class="active">Hotels (1)</li>
<li data-filter="REVIEWS_RESTAURANTS">Restaurants (1)</li>

如何实现这种自动化?

1 个答案:

答案 0 :(得分:3)

只需尝试为用户抓取整个内容,然后根据您的要求对其进行隔离。

from selenium import webdriver
driver = webdriver.Firefox()
driver.get('https://www.tripadvisor.in/members-reviews/rahuls896')
next_button = driver.find_element_by_id("cs-paginate-next")
next_button.click()