Question

我正在尝试抓取网站，并且在成功抓取了某些页面之后，我陷入了scrapy.downloadermiddlewares.redirect debug（301）循环中，有人可以帮忙吗？

这是我的蜘蛛：

# -*- coding: utf-8 -*-
import scrapy
from scrapy.spiders import CrawlSpider
from scrapy.loader import ItemLoader
from demo_crawl.items import CarItem
import time
import random


class CarSales(CrawlSpider):
    name = 'cars'
    start_urls = ['https://www.carsales.com.au/cars/results/']

    def parse(self, response):
        for car in response.xpath("//div[@class='listing-item n_margin-20 showcase  ']"):
            loader = ItemLoader(item=CarItem(), selector=car, response=response)
            url = car.xpath(".//div[@class='n_width-max title ']/a[@href]/@href").extract_first()
            url = response.urljoin(url)
            loader.add_value('url', url)
            loader.add_xpath('price', ".//div[@class='price']")
            loader.add_xpath('type', "./@data-category")
            loader.add_xpath('section', "./@data-webm-section")
            yield loader.load_item()
        next_page = response.xpath(".//li[@class='next tippable']/a[@title='Next']/@href").extract_first()
        num = random.randint(1, 3)
        time.sleep(num)
        if next_page is not None:
            link = response.urljoin(next_page)
            yield scrapy.Request(url=link, callback=self.parse)

这是错误：

2019-07-15 14:46:37 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.carsales.com.au/cars/results/?offset=6972&setype=pagination&q=Service.Carsales.> from <GET https://www.carsales.com.au/cars/results?offset=6972&setype=pagination&q=Service.Carsales.>
2019-07-15 14:47:01 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.carsales.com.au/cars/results/?offset=7128&setype=pagination&q=Service.Carsales.> (referer: https://www.carsales.com.au/cars/results/?offset=7116&setype=pagination&q=Service.Carsales.) ['cached']

如何解决scrapy中的scrapy.downloadermiddlewares.redirect错误？

0 个答案: