Question

我在Hostelworld.com上运行以下scrapy蜘蛛，检索：

第一页上的大陆，国家和国家网址

跟踪国家/地区网址后来自特定国家/地区的城市列表

def parse_page1(self, response):
    for sel in response.xpath('//li[@class="accordion-navigation"]//ul[@class="small-block-grid-2 medium-block-grid-4 large-block-grid-6"]/li'):
        item = HostelWorldItem()
        item['continent'] = sel.xpath('./../../@id').extract_first()
        item['country'] = sel.xpath('./a/text()').extract_first()
        item['country_url'] = sel.xpath('./a/@href').extract_first()

        yield item

        url = response.urljoin('%s'%(item['country_url']))
        request = scrapy.Request(url, callback=self.parse_dir_contents)
        request.meta['item'] = item
        yield request

def parse_dir_contents(self, response):
    item = response.meta['item']
    item['city'] = response.xpath('//div[@class="otherlocations"]/li/a/text()').extract_first()
    yield item

运行它时出现以下错误，我无法找到解决方案：

scrapy/spiders/__init__.py", line 76, in parse
raise NotImplementedError
NotImplementedError

非常感谢你的帮助！

Answer 1

Scrapy Spider需要定义parse()方法，而您没有方法。

默认情况下，scrapy.Spider链如何工作是通过回调start_urls向self.parse中的每个网址发出请求。

Answer 2

您需要在https://github.com/scrapy/scrapy/blob/master/scrapy/spiders/init.py#L89处实施parse()方法。

Scrapy返回NotImplementedError

2 个答案: