Question

我有以下代码在Airbnb页面上打印多个房源：

import requests, bs4

url = 'https://www.airbnb.pl/s/Girona--Hiszpania/homes?place_id=ChIJRRrTHsPNuhIRQMqjIeD6AAM&query=Girona%2C%20Hiszpania&refinement_paths%5B%5D=%2Fhomes&allow_override%5B%5D=&s_tag=b5bnciXv'
response = requests.get(url)
soup = bs4.BeautifulSoup(response.text)

listings = soup.select('._f21qs6')

print(len(listings))

打印输出应为“18”，这是此页面上的列表数量。

但是，输出不一致。有时我确实得到“18”，但有时候它是“0”。

我可以在此代码中改进哪些内容以使输出更加一致吗？

编辑：我重构了代码以使输出中的不一致变得明显：

import requests, bs4

def get_listings():
    url = 'https://www.airbnb.pl/s/Girona--Hiszpania/homes?place_id=ChIJRRrTHsPNuhIRQMqjIeD6AAM&query=Girona%2C%20Hiszpania&refinement_paths%5B%5D=%2Fhomes&allow_override%5B%5D=&s_tag=b5bnciXv'
    response = requests.get(url)
    soup = bs4.BeautifulSoup(response.text, "html.parser")
    listings = soup.select('._f21qs6')
    return listings

def check_if_all_listings_downloaded():
    number_of_listings = len(get_listings())
    print("Current number of listings: " + str(number_of_listings))
    while number_of_listings != 18:
        print("Too few listings: " + str(number_of_listings))
        number_of_listings = len(get_listings())
    print("All fine! The number of listings is: " + str(number_of_listings))

check_if_all_listings_downloaded()

此重构代码的示例输出为：

Current number of listings: 0
Too few listings: 0
Too few listings: 0
Too few listings: 16
All fine! The number of listings is: 18

Answer 1

在此处指定解析器。bs4.BeautifulSoup(response.text, 'html.parser') 使用html解析器而不是lxml ...我用html解析器尝试了你的代码，它似乎是一致的..我不确定为什么lxml显示不一致Difference Between Parsers请参阅此链接以了解解析器之间的区别

如何使BeautifulSoup输出更加一致

1 个答案: