我正在尝试抓取网站,并且在成功抓取了某些页面之后,我陷入了scrapy.downloadermiddlewares.redirect debug(301)循环中,有人可以帮忙吗?
这是我的蜘蛛:
# -*- coding: utf-8 -*-
import scrapy
from scrapy.spiders import CrawlSpider
from scrapy.loader import ItemLoader
from demo_crawl.items import CarItem
import time
import random
class CarSales(CrawlSpider):
name = 'cars'
start_urls = ['https://www.carsales.com.au/cars/results/']
def parse(self, response):
for car in response.xpath("//div[@class='listing-item n_margin-20 showcase ']"):
loader = ItemLoader(item=CarItem(), selector=car, response=response)
url = car.xpath(".//div[@class='n_width-max title ']/a[@href]/@href").extract_first()
url = response.urljoin(url)
loader.add_value('url', url)
loader.add_xpath('price', ".//div[@class='price']")
loader.add_xpath('type', "./@data-category")
loader.add_xpath('section', "./@data-webm-section")
yield loader.load_item()
next_page = response.xpath(".//li[@class='next tippable']/a[@title='Next']/@href").extract_first()
num = random.randint(1, 3)
time.sleep(num)
if next_page is not None:
link = response.urljoin(next_page)
yield scrapy.Request(url=link, callback=self.parse)
这是错误:
2019-07-15 14:46:37 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.carsales.com.au/cars/results/?offset=6972&setype=pagination&q=Service.Carsales.> from <GET https://www.carsales.com.au/cars/results?offset=6972&setype=pagination&q=Service.Carsales.>
2019-07-15 14:47:01 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.carsales.com.au/cars/results/?offset=7128&setype=pagination&q=Service.Carsales.> (referer: https://www.carsales.com.au/cars/results/?offset=7116&setype=pagination&q=Service.Carsales.) ['cached']