从booking.com取消酒店价格时出现问题

时间:2020-05-18 12:32:50

标签: python web-scraping beautifulsoup empty-list

我刚刚开始学习python,我正尝试从booking.com提取酒店信息,理想的结果是分别打印出酒店名称,酒店链接和酒店价格的列表。

我遇到的问题是我的代码从一开始就可以工作,并且能够将输出导出到另一个文件,但是几天后,当我再次运行该代码时,我毫无疑问地获得了酒店名称和链接,但是我无法获得酒店价格,因为我只得到了一个空清单。

请在下面查看我的代码。谁能帮助我找出问题所在?

import lxml
from bs4 import BeautifulSoup
from bs4 import NavigableString
import selenium
from selenium import webdriver

url = "https://www.booking.com/searchresults.en-gb.html?aid=356980&label=gog235jc-1FCAsoqQFCCHB1bGl0emVySDNYA2ipAYgBAZgBCbgBF8gBDNgBAegBAfgBDIgCAagCA7gCy_H09QXAAgE&sid=11df088cc51daa70cb9f00a605f6318d&sb=1&src=hotel&src_elem=sb&error_url=https%3A%2F%2Fwww.booking.com%2Fhotel%2Fnl%2Fpulitzer.en-gb.html%3Faid%3D356980%3Blabel%3Dgog235jc-1FCAsoqQFCCHB1bGl0emVySDNYA2ipAYgBAZgBCbgBF8gBDNgBAegBAfgBDIgCAagCA7gCy_H09QXAAgE%3Bsid%3D11df088cc51daa70cb9f00a605f6318d%3Ball_sr_blocks%3D1052724_127806409_0_2_0%3Bcheckin%3D2020-06-01%3Bcheckout%3D2020-06-02%3Bdest_id%3D-2140479%3Bdest_type%3Dcity%3Bdist%3D0%3Bgroup_adults%3D2%3Bgroup_children%3D0%3Bhapos%3D1%3Bhighlighted_blocks%3D1052724_127806409_0_2_0%3Bhpos%3D1%3Bno_rooms%3D1%3Broom1%3DA%252CA%3Bsb_price_type%3Dtotal%3Bshow_room%3D1052724%3Bsr_order%3Dpopularity%3Bsr_pri_blocks%3D1052724_127806409_0_2_0__52108%3Bsrepoch%3D1589459170%3Bsrpvid%3Dff6f5770f10b008a%3Btype%3Dtotal%3Bucfs%3D1%26%3B&highlighted_hotels=10527&hp_sbox=1&ss=Amsterdam&is_ski_area=0&ssne=Amsterdam&ssne_untouched=Amsterdam&dest_id=-2140479&dest_type=city&checkin_year=2020&checkin_month=7&checkin_monthday=1&checkout_year=2020&checkout_month=7&checkout_monthday=2&group_adults=1&group_children=0&no_rooms=1&from_sf=1"

options = selenium.webdriver.ChromeOptions()
options.add_argument("headless")
driver = webdriver.Chrome(options=options, executable_path=r'C:\Users\abc\AppData\Local\Google\Chrome\chromedriver.exe')

driver.get(url)
src = driver.page_source
soup = BeautifulSoup(src, "lxml")

hotel_names = []
for hotel_name in soup.find_all("span", class_="sr-hotel__name"):
    hotel_name = hotel_name.string.replace('\n', '')
    hotel_names.append(hotel_name)


hotel_urls = []
booking_url = 'http//:www.booking.com'

for hotel_src in soup.find_all('a', class_='hotel_name_link url'):
    hotel_src = 'https//:www.booking.com{}'.format(hotel_src['href']).replace('\n','')
    hotel_urls.append(hotel_src)

hotel_prices = []
for hotel_price in soup.find_all(class_='bui-price-display__value prco-text-nowrap-helper prco-inline-block-maker-helper'):
    hotel_price = hotel_price.string
    hotel_price = ' '.join(hotel_price.split())
    hotel_prices.append(hotel_price)

print(hotel_names)
print(hotel_urls)
print(hotel_prices)

0 个答案:

没有答案