Scrapy 每次都返回随机值

时间:2021-01-24 20:52:30

标签: web-scraping scrapy

在这个程序中,我试图获取渥太华的所有租金价格,但它每次只返回一个随机价格,为什么?

import scrapy


class RentalPricesSpider(scrapy.Spider):
    name = 'rental_prices'
    allowed_domains = ['www.kijiji.ca']
    start_urls = ['https://www.kijiji.ca/b-real-estate/ottawa/c34l1700185']

    def parse(self, response):
        rental_price = response.xpath('normalize-space(//div[@class="price"]/text())').getall()
        yield {
            'rent': rental_price,
        }

1 个答案:

答案 0 :(得分:0)

您选择了错误的 xpath,因为您没有获得预期的输出。使用 css 选择器 div.price::text 代替 xpath。

import scrapy
class RentalPricesSpider(scrapy.Spider):
    name = 'rental_prices'
    allowed_domains = ['www.kijiji.ca']
    start_urls = ['https://www.kijiji.ca/b-real-estate/ottawa/c34l1700185']

    def parse(self, response):
        rental_price = response.css('div.price::text').getall()
        rental_price = [x.strip() for x in rental_price if x.strip()]


        # rental_price = list(map(str.strip ,x) for x in rental_price)
        yield {
            'rent': rental_price,
        }

process = CrawlerProcess(settings={
    "USER_AGENT" : "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.93 Safari/537.36",
    "FEEDS": {
        "items.json": {"format": "json"},
    },
})

process.crawl(RentalPricesSpider)
process.start()
相关问题