无法刮掉所需的DIV - BeautifulSoup

时间:2015-01-04 16:28:45

标签: python html python-3.x web-scraping beautifulsoup

我正在使用BeautifulSoup抓取this URL

我想在我们的功能标题后抓取每个DIV:

if hotel_meetings_soup.select("div#contentArea div.highlightBox"):
    print(hotel_meetings_soup.select("div#contentArea")) # debug 1
    exit(0)
    for meeting in hotel_meetings_soup.select("div#contentArea div.highlightBox"):
        print("\n Feature start here\n")
        print(meeting)
        # Rest of code

所有DIV都有相同的类highlightBox,但我不知道为什么debug 1只打印DIV的标记

Number Of Guest Rooms:  500
Number Of Meeting Spaces:   29
Largest Meeting Space:  17,377 sq ft (1,614.28 sq.m)

但不是其他人。

1 个答案:

答案 0 :(得分:0)

我们的想法是首先通过文字找到Our Features h3元素,然后使用find_next_siblings()找到合适的下一个兄弟:

import requests
from bs4 import BeautifulSoup

url = 'http://www.starwoodhotels.com/sheraton/property/meetings/index.html?language=en_US&propertyID=1391'
response = requests.get(url, headers={
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'
})

soup = BeautifulSoup(response.content)
features = soup.find(text='Our Features')

for div in features.parent.find_next_siblings('div', class_='highlightBox'):
    print(div.text.strip())

打印:

Weddings
Host a beautiful wedding in the Valley of the Sun with our spectacular views, lush ceremony lawns, and upscale ballrooms with pre-function space. Stellar catering and superb service ensure a amazing day. More >
...
Get Rewarded
Earn Starpoints® and eligible nights toward SPG elite status on your next meeting or event. More >