我正在使用BeautifulSoup
抓取this URL。
我想在我们的功能标题后抓取每个DIV:
if hotel_meetings_soup.select("div#contentArea div.highlightBox"):
print(hotel_meetings_soup.select("div#contentArea")) # debug 1
exit(0)
for meeting in hotel_meetings_soup.select("div#contentArea div.highlightBox"):
print("\n Feature start here\n")
print(meeting)
# Rest of code
所有DIV都有相同的类highlightBox
,但我不知道为什么debug 1
只打印DIV的标记
Number Of Guest Rooms: 500
Number Of Meeting Spaces: 29
Largest Meeting Space: 17,377 sq ft (1,614.28 sq.m)
但不是其他人。
答案 0 :(得分:0)
我们的想法是首先通过文字找到Our Features
h3
元素,然后使用find_next_siblings()
找到合适的下一个兄弟:
import requests
from bs4 import BeautifulSoup
url = 'http://www.starwoodhotels.com/sheraton/property/meetings/index.html?language=en_US&propertyID=1391'
response = requests.get(url, headers={
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'
})
soup = BeautifulSoup(response.content)
features = soup.find(text='Our Features')
for div in features.parent.find_next_siblings('div', class_='highlightBox'):
print(div.text.strip())
打印:
Weddings
Host a beautiful wedding in the Valley of the Sun with our spectacular views, lush ceremony lawns, and upscale ballrooms with pre-function space. Stellar catering and superb service ensure a amazing day. More >
...
Get Rewarded
Earn Starpoints® and eligible nights toward SPG elite status on your next meeting or event. More >