无法从booking.com获取酒店价格

时间:2016-06-13 08:17:46

标签: python-2.7 beautifulsoup

我不想从booking.com上搜索酒店价格 但是无法弄清楚为什么我在使用beautifulsoup4搜索课程时返回的空列表。我的代码在这里给出。

import webbrowser, requests
from bs4 import BeautifulSoup


res = requests.get("http://www.booking.com/searchresults.html?label=gen173nr-1FCAEoggJCAlhYSDNiBW5vcmVmaGyIAQGYATG4AQjIAQzYAQHoAQH4AQKoAgM&sid=c24fad210186ae699e89a0d3cab10039&dcid=4&checkin_monthday=18&checkin_year_month=2016-6&checkout_monthday=19&checkout_year_month=2016-6&class_interval=1&dest_id=-2092511&dest_type=city&group_adults=2&group_children=0&hlrd=0&label_click=undef&nflt=ht_id%3D204%3B&no_rooms=1&review_score_group=empty&room1=A%2CA&sb_price_type=total&sb_travel_purpose=business&score_min=0&src_elem=sb&ss=Kolkata%2C%20West%20Bengal%2C%20India&ss_raw=kolka&ssb=empty&order=score")
res.status_code
soup = BeautifulSoup(res.text,"lxml")
name = []
rating = []

hotel_name = soup.select('.sr-hotel__name')
hotel_price = soup.select('tr', class_='roomPrice')
hotel_rating = soup.select('.js--hp-scorecard-scoreval')

print hotel_price
for i in range(0, 10):
    name.append(hotel_name[i].contents[0])
    rating.append(hotel_rating[i].contents[0])
    #print name[i]
    #print rating[i]

1 个答案:

答案 0 :(得分:2)

我必须做两件事,1。添加用户代理,2。更改选择器,刮取的源实际上与您在浏览器中右键单击并选择查看源时看到的不同:

In [7]: head = {"User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36"}

In [8]: url = "http://www.booking.com/searchresults.html?label=gen173nr-1FCAEoggJCAlhYSDNiBW5vcmVmaGyIAQGYATG4AQjIAQzYAQHoAQH4AQKoAgM&sid=c24fad210186ae699e89a0d3cab10039&dcid=4&checkin_monthday=18&checkin_year_month=2016-6&checkout_monthday=19&checkout_year_month=2016-6&class_interval=1&dest_id=-2092511&dest_type=city&group_adults=2&group_children=0&hlrd=0&label_click=undef&nflt=ht_id%3D204%3B&no_rooms=1&review_score_group=empty&room1=A%2CA&sb_price_type=total&sb_travel_purpose=business&score_min=0&src_elem=sb&ss=Kolkata%2C%20West%20Bengal%2C%20India&ss_raw=kolka&ssb=empty&order=score"

In [9]: res = requests.get(url, headers=head)

In [10]: soup = BeautifulSoup(res.text,"html.parser")

In [11]: hotels = soup.select("#hotellist_inner div.sr_item.sr_item_new")

In [12]: for hotel in hotels:
   ....:         name = hotel.select_one("span.sr-hotel__name").text.strip()   ....:         print(name)
   ....:         score = hotel.select_one("span.average.js--hp-scorecard-scoreval")
   ....:         print(score.text.strip())
   ....:         price = hotel.select_one("table div.sr-prc--num.sr-prc--final")
   ....:         print(price.text.strip() if price else "Unavailable")
   ....:     
The Oberoi Grand Kolkata
9.0
€ 113
Taj Bengal
9.0
€ 113
Sapphire Suites
7.4
Unavailable
The Gateway Hotel EM Bypass Kolkata
8.6
€ 84
The Lalit Great Eastern Kolkata
8.6
€ 101
Swissôtel Kolkata
8.5
€ 86
Kenilworth Hotel
8.5
€ 78
The Fern Residency Kolkata
8.4
€ 84
ITC Sonar Kolkata A Luxury Collection Hotel
8.3
€ 116
Hyatt Regency
8.3
€ 63
Treebo Platinum
8.2
€ 38
The Corner Courtyard
8.2
€ 73
Jameson Inn Shiraz
8.0
€ 58
The Sonnet
7.9
€ 80
Hotel Casa Fortuna
7.9
€ 56
Pipal Tree Hotel
7.9
€ 77

您的选择soup.select('tr', class_='roomPrice')的语法也不正确,它会是soup.select('tr.roomPrice')

但是上面的输出,如果你去页面确实没有按分数排序,我们需要做的是使用基本网址并传递参数:

In [20]: params = {'checkin_year_month':'2016-6',
   ....: 'checkout_monthday':'19',
   ....: 'checkout_year_month':'2016-6',
   ....: 'class_interval':'1',
   ....: 'dest_id':'-2092511',
   ....: 'dest_type':'city',
   ....: 'dtdisc':'0',
   ....: 'group_adults':'2',
   ....: 'group_children':'0',
   ....: 'hlrd':'0',
   ....: 'hyb_red':'0',
   ....: 'inac':'0',
   ....: 'label_click':'undef',
   ....: 'nflt':'ht_id=204;',
   ....: 'nha_red':'0',
   ....: 'no_rooms':'1',
   ....: 'offset':'0',
   ....: 'order':'score',
   ....: 'postcard':'0',
   ....: 'redirected_from_city':'0',
   ....: 'redirected_from_landmark':'0',
   ....: 'redirected_from_region':'0',
   ....: 'review_score_group':'empty',
   ....: 'room1':'A,A',
   ....: 'sb_price_type':'total',
   ....: 'sb_travel_purpose':'business',
   ....: 'score_min':'0',
   ....: 'src_elem':'sb',
   ....: 'ss':'Kolkata, West Bengal, India',
   ....: 'ss_all':'0',
   ....: 'ss_raw':'kolka',
   ....: 'ssb':'empty',
   ....: 'sshis':'0'}

In [21]: head = {"User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36"}

In [22]: url = "http://www.booking.com/searchresults.html"

In [23]: res = requests.get(url, params=params, headers=head)

In [24]: soup = BeautifulSoup(res.text,"html.parser")

In [25]: hotels = soup.select("#hotellist_inner div.sr_item.sr_item_new")

In [26]: for hotel in hotels:
   ....:         name = hotel.select_one("span.sr-hotel__name").text.strip()   ....:         print(name)
   ....:         score = hotel.select_one("span.average.js--hp-scorecard-scoreval")
   ....:         print(score.text.strip())
   ....:         price = hotel.select_one("table div.sr-prc--num.sr-prc--final")
   ....:         print(price.text.strip() if price else "Unavailable")
   ....:     
The Oberoi Grand Kolkata
9.0
Unavailable
Taj Bengal
9.0
Unavailable
The Lalit Great Eastern Kolkata
8.6
Unavailable
The Gateway Hotel EM Bypass Kolkata
8.6
Unavailable
Swissôtel Kolkata
8.5
Unavailable
Kenilworth Hotel
8.5
Unavailable
The Fern Residency Kolkata
8.4
Unavailable
ITC Sonar Kolkata A Luxury Collection Hotel
8.3
Unavailable
Hyatt Regency
8.3
Unavailable
Treebo Platinum
8.2
Unavailable
The Corner Courtyard
8.2
Unavailable
Monovilla Inn
8.1
Unavailable
Jameson Inn Shiraz
8.0
Unavailable
The Sonnet
7.9
Unavailable
Hotel Casa Fortuna
7.9
Unavailable

这会使价格被隐藏的使用here所以我们需要添加更多的逻辑,我会稍微编辑一下答案。