Question

我有以下抓取代码：

import requests, bs4

def make_soup():
    url = 'https://www.airbnb.pl/s/Girona--Hiszpania/homes?place_id=ChIJRRrTHsPNuhIRQMqjIeD6AAM&query=Girona%2C%20Hiszpania&refinement_paths%5B%5D=%2Fhomes&allow_override%5B%5D=&s_tag=b5bnciXv'
    response = requests.get(url)
    soup = bs4.BeautifulSoup(response.text, "html.parser")
    return soup

def get_listings():
    soup = make_soup()
    listings = soup.select('._f21qs6')
    number_of_listings = len(listings)
    print("Current number of listings: " + str(number_of_listings))
    while number_of_listings != 18:
        print("Too few listings: " + str(number_of_listings))
        soup = make_soup()
        listings = soup.select('._f21qs6')
        number_of_listings = len(listings)
    print("All fine! The number of listings is: " + str(number_of_listings))
    return listings

new_listings = get_listings()
print(new_listings)

我认为def get_listings()会将listings作为字符串返回，因此我无法在其上使用BeautifulSoup的prettify()，并且new_listings会被打印为一个文本块。

有没有办法以HTML格式打印new_listings或至少将每个标签打印在不同的行？

Answer 1

type(new_listings)
# list

显示new_listings是一个列表。尝试：

print(new_listings[0].prettify())

Answer 2

尝试：

from pprint import pprint 
pprint(new_listings)

pprint完美美化了许多输出。

如何打印BeautifulSoup的字符串输出

2 个答案: