我有以下抓取代码:
import requests, bs4
def make_soup():
url = 'https://www.airbnb.pl/s/Girona--Hiszpania/homes?place_id=ChIJRRrTHsPNuhIRQMqjIeD6AAM&query=Girona%2C%20Hiszpania&refinement_paths%5B%5D=%2Fhomes&allow_override%5B%5D=&s_tag=b5bnciXv'
response = requests.get(url)
soup = bs4.BeautifulSoup(response.text, "html.parser")
return soup
def get_listings():
soup = make_soup()
listings = soup.select('._f21qs6')
number_of_listings = len(listings)
print("Current number of listings: " + str(number_of_listings))
while number_of_listings != 18:
print("Too few listings: " + str(number_of_listings))
soup = make_soup()
listings = soup.select('._f21qs6')
number_of_listings = len(listings)
print("All fine! The number of listings is: " + str(number_of_listings))
return listings
new_listings = get_listings()
print(new_listings)
我认为def get_listings()
会将listings
作为字符串返回,因此我无法在其上使用BeautifulSoup的prettify()
,并且new_listings
会被打印为一个文本块。
有没有办法以HTML格式打印new_listings
或至少将每个标签打印在不同的行?
答案 0 :(得分:1)
type(new_listings)
# list
显示new_listings
是一个列表。尝试:
print(new_listings[0].prettify())
答案 1 :(得分:0)
尝试:
from pprint import pprint
pprint(new_listings)
pprint
完美美化了许多输出。