我想使用网络抓取来提取有关在学生住宿网站uniplace上列出的信息。这是一个示例清单: [https://www.uniplaces.com/accommodation/berlin/92342][1]
我想提取一些信息,例如价格,#个浴室,#个室友,...
但是,使用我在网上找到的不同方法,我无法提取完整 html代码。总是缺少包含相关信息的小节。在网站上,您可以使用小箭头打开这些小节。我是html的新手,所以我不明白为什么无法将其撤消。
这是我尝试过的代码:
from selenium import webdriver
from bs4 import BeautifulSoup
options = webdriver.ChromeOptions()
options.add_argument('headless')
driver= webdriver.Chrome(chrome_options=options,executable_path=r'path/chromedriver.exe')
driver.get('https://www.uniplaces.com/accommodation/berlin/92342')
html_doc = driver.page_source
soup= BeautifulSoup(html_doc,'lxml')
print (soup.prettify())
及其变化形式:
import urllib.request
fp= urllib.request.urlopen("https://www.uniplaces.com/accommodation/berlin/92342")
mybytes = fp.read()
mystr = mybytes.decode("utf8")
fp.close()
print(mystr)
如果有人可以提供帮助,我将非常感谢任何提示和技巧!
祝一切顺利, 汉娜
答案 0 :(得分:1)
该网站使用内部GraphQL API,可从
访问https://offer-aggregate-graphql.uniplaces.com/graphql
GraphQL是一种查询语言,可让您选择要查询的字段。这对您来说非常方便,因为您可能希望按照问题中的建议访问特定信息。
以下示例查询要约价格,条件(包括最大人数)和住宿类型(面积,卧室和浴室的数量):
import requests
id = "92342"
query = """
query($id: ID!) {
offerAggregate(id: $id) {
accommodation_offer {
reference_price {
amount
currency_code
}
requisites {
conditions {
cancellation_policy
minimum_nights
max_guests
}
}
}
property_aggregate {
property {
typology {
area
number_of_bedrooms
number_of_bathrooms
}
}
}
}
}
"""
resp = requests.post(
'https://offer-aggregate-graphql.uniplaces.com/graphql',
json={
"query": query,
"variables": {
"id": id
}
}
)
body = resp.json()
print(body)
您可以了解有关GraphQL查询here
的更多信息要约页面中使用的初始请求很大,您只需要选择要查询的字段即可。这是使用curl的查询:
curl 'https://offer-aggregate-graphql.uniplaces.com/graphql' \
-H 'content-type: application/json' \
--data-binary '{"query":"fragment PhotosFragment on Photos {\n id\n hash\n placeholder\n metadata {\n internal_label\n __typename\n }\n __typename\n}\n\nfragment PropertyLocationFragment on PropertyLocation {\n neighborhood_id\n geo {\n latitude\n longitude\n __typename\n }\n address {\n street\n city_code\n number\n postal_code\n extra\n __typename\n }\n __typename\n}\n\nfragment PropertyAggregateFragment on PropertyAggregate {\n property {\n id\n external_reference {\n human_reference\n api_reference\n __typename\n }\n landlord_resident {\n gender\n age_range\n occupation\n pets\n family\n __typename\n }\n features {\n Code\n Exists\n __typename\n }\n floors {\n units {\n id\n area\n photos {\n id\n displayable\n __typename\n }\n features {\n Code\n Exists\n __typename\n }\n subunits {\n id\n type_code\n features {\n Code\n Exists\n __typename\n }\n photos {\n id\n displayable\n __typename\n }\n __typename\n }\n type_code\n __typename\n }\n __typename\n }\n lifecycle {\n rent_by\n out_of_platform {\n out\n __typename\n }\n __typename\n }\n location {\n ...PropertyLocationFragment\n __typename\n }\n main_features {\n gas_type\n __typename\n }\n metadata {\n locale_code\n text\n main\n __typename\n }\n photos {\n id\n displayable\n __typename\n }\n restrictions {\n occupation\n origin\n __typename\n }\n rules {\n code\n exists\n __typename\n }\n typology {\n area\n accommodation_type_code\n type_code\n number_of_bedrooms\n number_of_bathrooms\n __typename\n }\n verification {\n verified\n __typename\n }\n video {\n url\n __typename\n }\n __typename\n }\n neighborhood {\n id\n city_code\n slug\n __typename\n }\n __typename\n}\n\nfragment AccommodationOfferBillFragment on AccommodationOfferBill {\n included\n maximum {\n ...AccommodationOfferBillMaximumFragment\n __typename\n }\n __typename\n}\n\nfragment AccommodationOfferBillMaximumFragment on AccommodationOfferBillMaximum {\n capped\n max {\n amount\n currency_code\n __typename\n }\n __typename\n}\n\nfragment AccommodationOfferCostsFragment on AccommodationOfferCosts {\n bills {\n maximum {\n ...AccommodationOfferBillMaximumFragment\n __typename\n }\n water {\n ...AccommodationOfferBillFragment\n __typename\n }\n electricity {\n ...AccommodationOfferBillFragment\n __typename\n }\n gas {\n ...AccommodationOfferBillFragment\n __typename\n }\n internet {\n ...AccommodationOfferBillFragment\n __typename\n }\n __typename\n }\n services {\n cleaning {\n periodicity\n type\n __typename\n }\n __typename\n }\n __typename\n}\n\nfragment AccommodationOfferPropertyFragment on AccommodationOfferProperty {\n unitary\n number_of_units\n property_id\n unit_id\n photos_unit_id\n subunit_id\n __typename\n}\n\nfragment AccommodationOfferContractOptionFragment on AccommodationOfferContractOption {\n id\n start_date\n end_date\n contract_value {\n amount\n currency_code\n __typename\n }\n instalments {\n date\n value {\n amount\n currency_code\n __typename\n }\n __typename\n }\n number_of_instalments\n __typename\n}\n\nfragment AccommodationOfferContractStandardFragment on AccommodationOfferContractStandard {\n extra_after\n penalty {\n nights_threshold\n type\n percentage\n value {\n amount\n currency_code\n __typename\n }\n __typename\n }\n extra_per_guest {\n amount\n currency_code\n __typename\n }\n rents {\n amount\n currency_code\n __typename\n }\n __typename\n}\n\nfragment AccommodationOfferContractFragment on AccommodationOfferContract {\n type\n exclusive\n is_instant_booking\n commission\n deposit {\n pay_to\n type\n value {\n amount\n currency_code\n __typename\n }\n __typename\n }\n admin_fee {\n exact_value\n value {\n amount\n currency_code\n __typename\n }\n __typename\n }\n variable_admin_fee {\n default_admin_fee {\n exact_value\n value {\n amount\n currency_code\n __typename\n }\n __typename\n }\n levels {\n exact_value\n value {\n amount\n currency_code\n __typename\n }\n until\n __typename\n }\n __typename\n }\n fixed {\n options {\n ...AccommodationOfferContractOptionFragment\n __typename\n }\n __typename\n }\n fixed_unitary {\n options {\n ...AccommodationOfferContractOptionFragment\n __typename\n }\n extra_after\n extra_per_guest {\n amount\n currency_code\n __typename\n }\n __typename\n }\n standard {\n ...AccommodationOfferContractStandardFragment\n __typename\n }\n __typename\n}\n\nfragment AccommodationOfferRequisitesFragment on AccommodationOfferRequisites {\n requirements {\n offline_id\n guarantor\n contract\n __typename\n }\n conditions {\n cancellation_policy\n minimum_nights\n max_guests\n __typename\n }\n __typename\n}\n\nfragment AccommodationOfferTitleFragment on AccommodationOfferTitle {\n locale_code\n text\n main\n __typename\n}\n\nfragment AccommodationOfferAvailabilityFragment on AccommodationOfferAvailability {\n standard_unitary_contract {\n available_from\n last_updated_at\n __typename\n }\n standard_contract {\n available_from\n last_updated_at\n __typename\n }\n fixed_contract {\n available_from\n last_updated_at\n __typename\n }\n __typename\n}\n\nfragment AccommodationOfferAvailabilitiesStandardFragment on AccommodationOfferAvailabilitiesStandard {\n available_periods {\n start_date\n end_date\n __typename\n }\n years {\n year\n months {\n Jan\n Feb\n Mar\n Apr\n May\n Jun\n Jul\n Aug\n Sep\n Oct\n Nov\n Dec\n __typename\n }\n __typename\n }\n __typename\n}\n\nfragment AccommodationOfferAvailabilitiesStandardUnitaryFragment on AccommodationOfferAvailabilitiesStandardUnitary {\n available_periods {\n start_date\n end_date\n __typename\n }\n blocked_intervals {\n start_date\n end_date\n by\n extra_info\n __typename\n }\n __typename\n}\n\nfragment AccommodationOfferAvailabilitiesFixedFragment on AccommodationOfferAvailabilitiesFixed {\n options {\n id\n status\n __typename\n }\n __typename\n}\n\nfragment AccommodationOfferAvailabilitiesFragment on AccommodationOfferAvailabilities {\n standard {\n ...AccommodationOfferAvailabilitiesStandardFragment\n __typename\n }\n standard_unitary {\n ...AccommodationOfferAvailabilitiesStandardUnitaryFragment\n __typename\n }\n fixed {\n ...AccommodationOfferAvailabilitiesFixedFragment\n __typename\n }\n fixed_unitary {\n ...AccommodationOfferAvailabilitiesFixedFragment\n __typename\n }\n __typename\n}\n\nfragment AccommodationOfferFragment on AccommodationOffer {\n id\n version\n parent\n accommodation_provider_id\n property {\n ...AccommodationOfferPropertyFragment\n __typename\n }\n title {\n ...AccommodationOfferTitleFragment\n __typename\n }\n costs {\n ...AccommodationOfferCostsFragment\n __typename\n }\n requisites {\n ...AccommodationOfferRequisitesFragment\n __typename\n }\n availability_summary_info {\n ...AccommodationOfferAvailabilityFragment\n __typename\n }\n availabilities {\n ...AccommodationOfferAvailabilitiesFragment\n __typename\n }\n lifecycle {\n published {\n published\n __typename\n }\n __typename\n }\n restrictions {\n gender\n occupancy\n __typename\n }\n contract {\n ...AccommodationOfferContractFragment\n __typename\n }\n floor_plan {\n name\n __typename\n }\n main_photo {\n id\n __typename\n }\n reference_price {\n amount\n currency_code\n __typename\n }\n __typename\n}\n\nfragment AccommodationProviderFragment on AccommodationProvider {\n id\n booking {\n gap_on_booking {\n soft_maximum\n hard_maximum\n __typename\n }\n __typename\n }\n verifications {\n email_address\n phone\n offline_id\n __typename\n }\n basic_info {\n preference_settings {\n locale_code\n __typename\n }\n __typename\n }\n account_management {\n key_account\n __typename\n }\n stats {\n bookings {\n accepted {\n total\n __typename\n }\n requested {\n total\n __typename\n }\n rejected {\n total\n __typename\n }\n confirmed {\n total\n __typename\n }\n __typename\n }\n response_time\n __typename\n }\n created {\n at\n __typename\n }\n __typename\n}\n\nfragment GlobalizationCityFragment on GlobalizationCity {\n code\n configuration {\n slug\n __typename\n }\n metadata {\n name_translations {\n locale_code\n text\n __typename\n }\n __typename\n }\n __typename\n}\n\nfragment GlobalizationCountryFragment on GlobalizationCountry {\n code\n metadata {\n name_translations {\n locale_code\n text\n __typename\n }\n __typename\n }\n __typename\n}\n\nfragment GlobalizationAggregateFragment on GlobalizationAggregate {\n city {\n ...GlobalizationCityFragment\n __typename\n }\n country {\n ...GlobalizationCountryFragment\n __typename\n }\n __typename\n}\n\nquery offerAggregate($id: ID!, $useCache: Boolean) {\n offerAggregate(id: $id, useCache: $useCache) {\n id\n units_sorted {\n unit_id\n __typename\n }\n photos {\n ...PhotosFragment\n __typename\n }\n property_aggregate {\n ...PropertyAggregateFragment\n __typename\n }\n accommodation_offer {\n ...AccommodationOfferFragment\n __typename\n }\n accommodation_provider {\n ...AccommodationProviderFragment\n __typename\n }\n globalization_aggregate {\n ...GlobalizationAggregateFragment\n __typename\n }\n __typename\n }\n}\n","variables":{"id":"92342"},"operationName":"offerAggregate"}'