如何使BeautifulSoup输出更加一致

时间:2018-03-23 06:24:39

标签: python web-scraping beautifulsoup

我有以下代码在Airbnb页面上打印多个房源:

import requests, bs4

url = 'https://www.airbnb.pl/s/Girona--Hiszpania/homes?place_id=ChIJRRrTHsPNuhIRQMqjIeD6AAM&query=Girona%2C%20Hiszpania&refinement_paths%5B%5D=%2Fhomes&allow_override%5B%5D=&s_tag=b5bnciXv'
response = requests.get(url)
soup = bs4.BeautifulSoup(response.text)

listings = soup.select('._f21qs6')

print(len(listings))

打印输出应为“18”,这是此页面上的列表数量。

但是,输出不一致。有时我确实得到“18”,但有时候它是“0”。

我可以在此代码中改进哪些内容以使输出更加一致吗?

编辑:我重构了代码以使输出中的不一致变得明显:

import requests, bs4

def get_listings():
    url = 'https://www.airbnb.pl/s/Girona--Hiszpania/homes?place_id=ChIJRRrTHsPNuhIRQMqjIeD6AAM&query=Girona%2C%20Hiszpania&refinement_paths%5B%5D=%2Fhomes&allow_override%5B%5D=&s_tag=b5bnciXv'
    response = requests.get(url)
    soup = bs4.BeautifulSoup(response.text, "html.parser")
    listings = soup.select('._f21qs6')
    return listings

def check_if_all_listings_downloaded():
    number_of_listings = len(get_listings())
    print("Current number of listings: " + str(number_of_listings))
    while number_of_listings != 18:
        print("Too few listings: " + str(number_of_listings))
        number_of_listings = len(get_listings())
    print("All fine! The number of listings is: " + str(number_of_listings))

check_if_all_listings_downloaded()

此重构代码的示例输出为:

Current number of listings: 0
Too few listings: 0
Too few listings: 0
Too few listings: 16
All fine! The number of listings is: 18

1 个答案:

答案 0 :(得分:0)

在此处指定解析器。bs4.BeautifulSoup(response.text, 'html.parser') 使用html解析器而不是lxml ...我用html解析器尝试了你的代码,它似乎是一致的..我不确定为什么lxml显示不一致Difference Between Parsers请参阅此链接以了解解析器之间的区别