Question

我需要一些帮助来刮除一个房地产网站的多个页面。我已经编写了成功刮取第1页的代码，并尝试实现删除所有25页的代码，但是现在卡住了。任何提示/帮助将不胜感激。


1. Java EE 7 API level required in Spring's corresponding features now.

Servlet 3.1, Bean Validation 1.1, JPA 2.1, JMS 2.0
Recent servers: e.g. Tomcat 8.5+, Jetty 9.4+, WildFly 10+


2. Compatibility with Java EE 8 API level at runtime.

Servlet 4.0, Bean Validation 2.0, JPA 2.2, JSON Binding API 1.0
Tested against Tomcat 9.0, Hibernate Validator 6.0, Apache Johnzon 1.1

Answer 1

每次刮页时应增加页码。试试这个：

import requests
from bs4 import BeautifulSoup
from csv import writer

base_url = 'https://www.rew.ca/properties/areas/kelowna-bc'

for i in range(1, 26):
    url = '/page/' + str(i)

    while url:
        response = requests.get(f"{base_url}{url}")
        soup = BeautifulSoup(response.text, "html.parser")
        listings = soup.find_all("article")

        with open("property4.csv", "w") as csv_file:
            csv_writer = writer(csv_file)
            csv_writer.writerow(["title", "type", "price", "location", "bedrooms", "bathrooms", "square feet", "link"])
        for listing in listings:
            location = listing.find(class_="displaypanel-info").get_text().strip()
            price = listing.find(class_="displaypanel-title hidden-xs").get_text().strip()
            link = listing.find("a").get('href').strip()
            title = listing.find("a").get('title').strip()
            type = (listing.find(class_="clearfix hidden-xs").find(class_="displaypanel-info")).get_text()
            bedrooms = (listing.find_all("li")[2]).get_text()
            bathrooms = (listing.find_all("li")[3]).get_text()
            square_feet = (listing.find_all("li")[4]).get_text()
            csv_writer.writerow([title, type, price, location, bedrooms, bathrooms, square_feet, link])
            next_btn = soup.find(class_="paginator-next_page paginator-control")
            url = next_btn.find("a")["href"] if "href" else None

如何使用pyton抓取房地产网站上的所有页面？

1 个答案: