如何在Scrapy Spider中修复HttpErrorMiddleware

时间:2019-08-20 22:30:42

标签: python scrapy

我正在尝试使用Scrapy进行水平爬行。使用xpath,我获得了房地产站点中每个列表的链接以及下一页链接。

但是,当我运行蜘蛛时,我会不断收到此错误:

“ scrapy.spidermiddlewares.httperror.HttpErrorMiddleware”

感谢您的帮助

class BasicSpider(scrapy.Spider):
    name = 'manual'
    allowed_domains = ['web']
    start_urls = ('https://www.nyhabitat.com/new-york- 
    apartment/roommate-share',)  

    def parse (self, response):

        next_selector = response.xpath ('//*[@id="pagination- 
        container"]/ul/li[8]/a/@href')                                   
        for url in next_selector.extract_first():
            yield Request (response.urljoin(url))

        item_selector = response.xpath('//a[contains (@class, 
        "slider-item-link")]//@href')

        for url in item_selector.extract():
            yield Request(response.urljoin(url))

    def parse_item (self, response):
        l = ItemLoader(item=PropertiesItem(), response=response)
        l.add_xpath('price','// 
                   [@id="availability"]/div/div[1]/b/text()',
                   MapCompose(lambda i: i.replace(',', ''), float), 
                    re='[,.0-9]+')

结果:

'scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',

0 个答案:

没有答案