我已经构建了一个非常简单的刮刀,查看Airbnb列表。目标是通过一个给定的网站(即this one)。
first_page = BeautifulSoup(requests.get("https://www.airbnb.com/s/Copenhagen--Denmark/homes?allow_override%5B%5D=&s_tag=kHqeQTpz§ion_offset=1").text, 'html.parser')
listings = first_page.find_all('div', 'listing-card-wrapper')
for listing in listings:
print(listing.select("#listing-15616363 > div.infoContainer_v72lrv > a > div.ellipsized_1iurgbx > div > span:nth-child(1) > span:nth-child(1)"))
代码正确地遍历页面上的18个元素。但是,它会打印18个空数组,表明listing.select语句不起作用。我从Chrome开发工具复制选择器功能中获得了CSS标记。
答案 0 :(得分:3)
这是因为listing-15616363
特定于每个商家信息(请注意格式listing-{listing_id}
),因此在您的循环商家信息中没有类id = 'listing-15616363'
。
例如,如果你想获取网址,你可以这样做:
listing.find('a', class_ = "linkContainer_55zci1")['href']
或者,您可以使用比 BeautifulSoup (如果使用得当)快一个数量级的python lxml ,如下所示:
import requests
from lxml import html
url = "https://www.airbnb.com/s/Copenhagen--Denmark/homes?allow_override%5B%5D=&s_tag=kHqeQTpz§ion_offset=1"
response = requests.get(url)
root = html.fromstring(response.content)
result_list = []
def remove_non_ascii(text) :
return ''.join([i if ord(i) < 128 else '' for i in text])
currency = root.xpath('//div[@itemprop="offers"]/meta[@itemprop="priceCurrency"]/@content')[0].strip()
for row in root.xpath('//div[contains(@class, "listing-card-wrapper")]') :
if row :
url = row.xpath('.//a[@class="linkContainer_55zci1"]/@href')[0].strip()
title = row.xpath('.//div[@class="ellipsized_1iurgbx"]/span/text()')[0].strip()
price = remove_non_ascii(row.xpath('.//div[@class="inline_g86r3e"]/span//text()')[0].strip())
result_list.append({'url' : "https://www.airbnb.com" + url,
'title' : title, 'price' : price, 'currency' : currency})
print result_list
这将导致:
[{'url': 'https://www.airbnb.com/rooms/5316912', 'currency': 'INR', 'price': u' 3,823', 'title': 'Small City apt. next to the Metro'}, {'url': 'https://www.airbnb.com/rooms/16989400', 'currency': 'INR', 'price': u' 2,347', 'title': 'Cozy room close to city center'}, {'url': 'https://www.airbnb.com/rooms/17628374', 'currency': 'INR', 'price': u' 6,774', 'title': 'Cosy, quiet apartment in downtown Copenhagen'}, {'url': 'https://www.airbnb.com/rooms/1206721', 'currency': 'INR', 'price': u' 4,426', 'title': 'Apt.close to Metro, Airport and CHP'}, {'url': 'https://www.airbnb.com/rooms/13813273', 'currency': 'INR', 'price': u' 3,622', 'title': 'Large room in Vesterbro'}, {'url': 'https://www.airbnb.com/rooms/14083881', 'currency': 'INR', 'price': u' 9,322', 'title': 'City Room'}, {'url': 'https://www.airbnb.com/rooms/6221130', 'currency': 'INR', 'price': u' 5,365', 'title': 'cosy flat 2 min from Central Statio'}, {'url': 'https://www.airbnb.com/rooms/15804159', 'currency': 'INR', 'price': u' 3,823', 'title': 'Cozy, central near waterfront. Quality breakfast!'}, {'url': 'https://www.airbnb.com/rooms/17266268', 'currency': 'INR', 'price': u' 3,756', 'title': 'Cosy room in Frederiksberg'}, {'url': 'https://www.airbnb.com/rooms/2647233', 'currency': 'INR', 'price': u' 3,353', 'title': 'Bedroom & Living Room Frederiksberg'}, {'url': 'https://www.airbnb.com/rooms/12083235', 'currency': 'INR', 'price': u' 5,969', 'title': 'Wonderful Copenhagen is right here'}, {'url': 'https://www.airbnb.com/rooms/7787976', 'currency': 'INR', 'price': u' 7,042', 'title': 'Homely renovated flat with garden'}, {'url': 'https://www.airbnb.com/rooms/17556785', 'currency': 'INR', 'price': u' 1,610', 'title': u'Small Cosy home above our Caf\xe9 ( Breakfast incl )'}, {'url': 'https://www.airbnb.com/rooms/894420', 'currency': 'INR', 'price': u' 10,261', 'title': 'Wonderful apt. right in the city!'}, {'url': 'https://www.airbnb.com/rooms/17028460', 'currency': 'INR', 'price': u' 7,847', 'title': 'Nyhavn 3-bed apartment for families'}, {'url': 'https://www.airbnb.com/rooms/17651114', 'currency': 'INR', 'price': u' 6,371', 'title': 'Spacious place by canals in heart of Copenhagen'}, {'url': 'https://www.airbnb.com/rooms/10564051', 'currency': 'INR', 'price': u' 3,420', 'title': u'\u623f\u95f4\u5728\u54e5\u672c\u54c8\u6839\u7684\u5fc3\u810f'}, {'url': 'https://www.airbnb.com/rooms/17709435', 'currency': 'INR', 'price': u' 2,951', 'title': u'Hyggelig lejlighed t\xe6t p\xe5 centrum.'}]
答案 1 :(得分:1)
当网络抓取尝试使用xpath或特定元素属性而不是css选择器时,因为它们通常对每个元素都过于具体。
我没有使用css选择器,而是通过使用以下代码中的itemprop
属性实现了您想要的目标:
<强>代码:强>
from bs4 import BeautifulSoup
import requests
html_source = requests.get("https://www.airbnb.com/s/Copenhagen--Denmark/homes?allow_override%5B%5D=&s_tag=kHqeQTpz§ion_offset=1").text
first_page = BeautifulSoup(html_source, 'html.parser')
listings = first_page.find_all('div', {'itemprop':'itemListElement'})
for l in listings:
a = l.find_next('meta')
b = a.find_next('meta')
c = b.find_next('meta')
print("Name: ", a['content'])
print("Position: ", b['content'])
print("URL: ", c['content'])
print("-"*15)
<强>输出:强>
Name: Small City apt. next to the Metro - Apartment - København
Position: 1
URL: www.airbnb.com/rooms/5316912
---------------
Name: Cozy room close to city center - Apartment - Frederiksberg
Position: 2
URL: www.airbnb.com/rooms/16989400
---------------
Name: Cosy, quiet apartment in downtown Copenhagen - Apartment - København
Position: 3
URL: www.airbnb.com/rooms/17628374
---------------
Name: Apt.close to Metro, Airport and CHP - Apartment - Copenhagen
Position: 4
URL: www.airbnb.com/rooms/1206721
---------------
Name: Large room in Vesterbro - Apartment - København
Position: 5
URL: www.airbnb.com/rooms/13813273
---------------
Name: City Room - Apartment - København
Position: 6
URL: www.airbnb.com/rooms/14083881
---------------
Name: cosy flat 2 min from Central Statio - Apartment - København V
Position: 7
URL: www.airbnb.com/rooms/6221130
---------------
Name: Cozy, central near waterfront. Quality breakfast! - Apartment - København
Position: 8
URL: www.airbnb.com/rooms/15804159
---------------
Name: Cosy room in Frederiksberg - Apartment - Frederiksberg
Position: 9
URL: www.airbnb.com/rooms/17266268
---------------
Name: Bedroom & Living Room Frederiksberg - Apartment - Frederiksberg
Position: 10
URL: www.airbnb.com/rooms/2647233
---------------
Name: Wonderful Copenhagen is right here - Apartment - København
Position: 11
URL: www.airbnb.com/rooms/12083235
---------------
Name: Homely renovated flat with garden - Apartment - Frederiksberg
Position: 12
URL: www.airbnb.com/rooms/7787976
---------------
Name: Small Cosy home above our Café ( Breakfast incl ) - Bed & Breakfast - København
Position: 13
URL: www.airbnb.com/rooms/17556785
---------------
Name: Wonderful apt. right in the city! - Apartment - Copenhagen
Position: 14
URL: www.airbnb.com/rooms/894420
---------------
Name: Nyhavn 3-bed apartment for families - Apartment - Copenhagen
Position: 15
URL: www.airbnb.com/rooms/17028460
---------------
Name: Spacious place by canals in heart of Copenhagen - Apartment - København
Position: 16
URL: www.airbnb.com/rooms/17651114
---------------
Name: 房间在哥本哈根的心脏 - Apartment - København
Position: 17
URL: www.airbnb.com/rooms/10564051
---------------
Name: Hyggelig lejlighed tæt på centrum. - Apartment - København
Position: 18
URL: www.airbnb.com/rooms/17709435
---------------