我想按城市(代码中列出的5个城市)搜索airbnb的列表,并希望收集以下信息:价格,列表链接,房间类型,客人数量等。< / p>
我能够获得链接,但我无法获得价格。
非常感谢您提供的任何帮助。
谢谢!
from bs4 import BeautifulSoup
import requests
import csv
from urllib.parse import urljoin # For joining next page url with base url
from datetime import datetime # For inserting the current date and time
start_url_nyc = "https://www.airbnb.com/s/New-York--NY--United-States"
start_url_mia = "https://www.airbnb.com/s/Miami--FL--United-States"
start_url_la = "https://www.airbnb.com/s/Los_Angeles--CA--United-States"
start_url_sf = "https://www.airbnb.com/s/San_Francisco--CA--United-States"
start_url_orl = "https://www.airbnb.com/s/Orlando--FL--United-States"
def scrape_airbnb(url):
# Set up the URL Request
headers = {'User-Agent': 'Mozilla/5.0'}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, "html.parser")
# Iterate over search results
for search_result in soup.find_all('div', 'infoContainer_tfq3vd'):
# Parse the name and price and record the time
link_end = search_result.find('a').get('href')
link = "https://www.airbnb.com" + link_end
price = search_result.find('span', 'data-pricerate').find('data-reactid').get(int)
return (price)
print(scrape_airbnb(start_url_orl))
答案 0 :(得分:0)
这是html代码:
<span data-pricerate="true" data-reactid=".91165im9kw.0.2.0.3.2.1.0.$0.$grid_0.$0/=1$=01$16085565.$=1$16085565.0.2.0.1.0.0.0.1:1">552</span>
这是你的代码
price = search_result.find('span', 'data-pricerate').find('data-reactid').get(int)
第一
某些属性(如HTML 5中的data- *属性)的名称不能用作关键字参数的名称:
data_soup = BeautifulSoup('<div data-foo="value">foo!</div>')
data_soup.find_all(data-foo="value")
# SyntaxError: keyword can't be an expression
您可以通过将这些属性放入搜索中来使用这些属性 字典并将字典传递给find_all()作为attrs 参数:
data_soup.find_all(attrs={"data-foo": "value"})
# [<div data-foo="value">foo!</div>]
比:
price = search_result.find('span', attrs={"data-pricerate":"true"})
这将返回包含价格为字符串的span标记,只需使用.text
price = search_result.find('span', attrs={"data-pricerate":"true"}).text