每隔一天后刮文字 - BeautifulSoup

时间:2014-12-06 18:00:11

标签: python python-3.x beautifulsoup

我想抓这个网址 http://deals.whotels.com/W-Guangzhou-3126/tnc/1680/24900/en 我必须得到每一个条款&条件 我在test

中获得了HTML标记
test = terms_and_conditions_soup.select(".popBodyText")[0]
for tes in test.findAll('br'):
    print(tes.extract())

但这仅打印<br> s。

我可以获得terms_and_conditions_soup.select(".popBodyText p")[0].text之类的所有字词,但我不希望这样。

即使我认为没有任何逻辑可以删除这些条款。

1 个答案:

答案 0 :(得分:2)

条款和条件只是由<br>分隔符分隔的文本行。您可以使用.get_text() method

获取所有带有换行符的文字
terms_elements = terms_and_conditions_soup.select(".popBodyText")[0]
terms = terms_elements.get_text('\n', strip=True)

或者你可以遍历.strings or .stripped_strings generators

terms = list(terms_elements.stripped_strings)

如果您只想要项目符号行,请选择以下内容:

terms = [t.lstrip('\u2022 ') for t in terms_elements.stripped_strings
         if t.startswith('\u2022')]

我也从选定的行中删除了子弹。

后一种方法的演示:

>>> import requests
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(requests.get('http://deals.whotels.com/W-Guangzhou-3126/tnc/1680/24900/en').content)
>>> terms_elements = soup.find(class_='popBodyText')
>>> [t.lstrip('\u2022 ') for t in terms_elements.stripped_strings if t.startswith('\u2022')]
['Offer valid at W Guangzhou only.', 'Offer is valid for stays booked by December 30, 2014 and stays completed from December 30, 2014 to January 1, 2015.', 'Limited number of rooms available.', 'Minimum stay of 2 nights is required & must stay over December 31, 2014.', '15% service charge and tax is not included in the package and subject to change without any notice.', 'Breakfast to be consumed at the Kitchen Table restaurant on departure day. Guest will be eligible for the breakfast based on number of persons booked overnight. Additional persons will be charged at the restaurant according to retail price.', 'NYE dinner buffet to be consumed at The Kitchen Table on December 31, 2014 only. Two guests per room will be eligible for the dinner buffet, and additional guests will be charged at the restaurant according to retail price. Prior reservations for the additional guests are required.', 'Free access to the FEI NYEcountdown party on December 31, 2014 is limited to a maximum of 2 adults only per room. Guests under 18 years old will not be allowed. The tickets will not be sold to general public or any external guests. Please collect the passes at time of check in.', 'Alcoholic beverage service is restricted to those 18 years or older (with valid identification).', 'Massage treatment in the package is limited to 60min AWAY Spa Signature Massage only. Spa treatment cannot be cumulated & valid during stay only. Prior reservation is recommended for the Spa treatment. This is to ensure space availability and the hotel will not be held responsible for any unconsumed portion of the package.', 'All package components are not transferable and must be consumed during stay. If any portion is not consumed, they will not be refundable or exchangeable in cash.', 'Extra services & amenities not part of the package will be charged per consumption & will be on guest’s own expense.', 'All package amenities are per room/per night and will be presented upon arrival unless otherwise noted.', 'This offer is only available if booked via Starwood distribution channels. Offer will not be applicable if booked through third party distribution channels, travel agents or any other external websites.', 'Offer not applicable to groups nor is it combinable with other special/discounted rates.', 'Starwood Hotels & Resorts Worldwide, Inc. reserves the right to cancel this promotion at anytime without notice.', 'Not responsible for omissions or typographical errors. Void where prohibited by law. Not to be combined with offers or promotions.', 'Any unused portion/s of the package is not transferable or exchangeable for cash/credit.', 'Starpoints, SPG, Starwood Preferred Guest, Sheraton, Four Points, W, Aloft, Le Meridien, The Luxury Collection, Element, Westin, St. Regis and their respective logos are the trademarks of Starwood Hotels & Resorts Worldwide, Inc., or its affiliates.']