我有以下代码在Airbnb页面上打印多个房源:
import requests, bs4
url = 'https://www.airbnb.pl/s/Girona--Hiszpania/homes?place_id=ChIJRRrTHsPNuhIRQMqjIeD6AAM&query=Girona%2C%20Hiszpania&refinement_paths%5B%5D=%2Fhomes&allow_override%5B%5D=&s_tag=b5bnciXv'
response = requests.get(url)
soup = bs4.BeautifulSoup(response.text)
listings = soup.select('._f21qs6')
print(len(listings))
打印输出应为“18”,这是此页面上的列表数量。
但是,输出不一致。有时我确实得到“18”,但有时候它是“0”。
我可以在此代码中改进哪些内容以使输出更加一致吗?
编辑:我重构了代码以使输出中的不一致变得明显:import requests, bs4
def get_listings():
url = 'https://www.airbnb.pl/s/Girona--Hiszpania/homes?place_id=ChIJRRrTHsPNuhIRQMqjIeD6AAM&query=Girona%2C%20Hiszpania&refinement_paths%5B%5D=%2Fhomes&allow_override%5B%5D=&s_tag=b5bnciXv'
response = requests.get(url)
soup = bs4.BeautifulSoup(response.text, "html.parser")
listings = soup.select('._f21qs6')
return listings
def check_if_all_listings_downloaded():
number_of_listings = len(get_listings())
print("Current number of listings: " + str(number_of_listings))
while number_of_listings != 18:
print("Too few listings: " + str(number_of_listings))
number_of_listings = len(get_listings())
print("All fine! The number of listings is: " + str(number_of_listings))
check_if_all_listings_downloaded()
此重构代码的示例输出为:
Current number of listings: 0
Too few listings: 0
Too few listings: 0
Too few listings: 16
All fine! The number of listings is: 18
答案 0 :(得分:0)
在此处指定解析器。bs4.BeautifulSoup(response.text, 'html.parser')
使用html解析器而不是lxml ...我用html解析器尝试了你的代码,它似乎是一致的..我不确定为什么lxml显示不一致Difference Between Parsers请参阅此链接以了解解析器之间的区别