Question

我正在使用BeautifulSoup抓取this URL。

我想在我们的功能标题后抓取每个DIV：

if hotel_meetings_soup.select("div#contentArea div.highlightBox"):
    print(hotel_meetings_soup.select("div#contentArea")) # debug 1
    exit(0)
    for meeting in hotel_meetings_soup.select("div#contentArea div.highlightBox"):
        print("\n Feature start here\n")
        print(meeting)
        # Rest of code

所有DIV都有相同的类highlightBox，但我不知道为什么debug 1只打印DIV的标记

Number Of Guest Rooms:  500
Number Of Meeting Spaces:   29
Largest Meeting Space:  17,377 sq ft (1,614.28 sq.m)

但不是其他人。

Answer 1

我们的想法是首先通过文字找到Our Features h3元素，然后使用find_next_siblings()找到合适的下一个兄弟：

import requests from bs4 import BeautifulSoup url = 'http://www.starwoodhotels.com/sheraton/property/meetings/index.html?language=en_US&propertyID=1391' response = requests.get(url, headers={ 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36' }) soup = BeautifulSoup(response.content) features = soup.find(text='Our Features') for div in features.parent.find_next_siblings('div', class_='highlightBox'): print(div.text.strip())

打印：

Weddings Host a beautiful wedding in the Valley of the Sun with our spectacular views, lush ceremony lawns, and upscale ballrooms with pre-function space. Stellar catering and superb service ensure a amazing day. More > ... Get Rewarded Earn Starpoints® and eligible nights toward SPG elite status on your next meeting or event. More >

无法刮掉所需的DIV - BeautifulSoup

1 个答案: