我需要一些帮助来刮除一个房地产网站的多个页面。我已经编写了成功刮取第1页的代码,并尝试实现删除所有25页的代码,但是现在卡住了。任何提示/帮助将不胜感激。
1. Java EE 7 API level required in Spring's corresponding features now.
Servlet 3.1, Bean Validation 1.1, JPA 2.1, JMS 2.0
Recent servers: e.g. Tomcat 8.5+, Jetty 9.4+, WildFly 10+
2. Compatibility with Java EE 8 API level at runtime.
Servlet 4.0, Bean Validation 2.0, JPA 2.2, JSON Binding API 1.0
Tested against Tomcat 9.0, Hibernate Validator 6.0, Apache Johnzon 1.1
答案 0 :(得分:0)
每次刮页时应增加页码。试试这个:
import requests
from bs4 import BeautifulSoup
from csv import writer
base_url = 'https://www.rew.ca/properties/areas/kelowna-bc'
for i in range(1, 26):
url = '/page/' + str(i)
while url:
response = requests.get(f"{base_url}{url}")
soup = BeautifulSoup(response.text, "html.parser")
listings = soup.find_all("article")
with open("property4.csv", "w") as csv_file:
csv_writer = writer(csv_file)
csv_writer.writerow(["title", "type", "price", "location", "bedrooms", "bathrooms", "square feet", "link"])
for listing in listings:
location = listing.find(class_="displaypanel-info").get_text().strip()
price = listing.find(class_="displaypanel-title hidden-xs").get_text().strip()
link = listing.find("a").get('href').strip()
title = listing.find("a").get('title').strip()
type = (listing.find(class_="clearfix hidden-xs").find(class_="displaypanel-info")).get_text()
bedrooms = (listing.find_all("li")[2]).get_text()
bathrooms = (listing.find_all("li")[3]).get_text()
square_feet = (listing.find_all("li")[4]).get_text()
csv_writer.writerow([title, type, price, location, bedrooms, bathrooms, square_feet, link])
next_btn = soup.find(class_="paginator-next_page paginator-control")
url = next_btn.find("a")["href"] if "href" else None