Scrapy spider产生了几个项目,但是只有每个请求首先调用管道

时间:2013-09-20 13:09:40

标签: python scrapy

即使每个请求我有几个项目,但只有第一个(每个请求)正在进入管道,实际上保存为Django模型实例。

这是我的代码,我错过了什么?

# my_spider.py
class MySpider(CrawlSpider):
    name = 'my_spider'
    ...

    def parse(self, response):
        x = HtmlXPathSelector(response)
        item = MyDjangoItem()
        headings = x.select('//h2/text()').extract()
        for h in headings:
            item['name'] = h
            yield item

        url = 'http://example.com/next'  # I have custom rules for constructing (not extracting) next url
        yield Request(url, callback=self.parse)

# pipelines.py
class MyPipeline(object):
    def process_item(self, item, spider):
        if spider.name == 'my_spider':
            if item['name']:
                item.save()
        return item

1 个答案:

答案 0 :(得分:9)

您需要在for循环中移动MyDjangoItem实例化,否则它总是产生相同的对象。

# my_spider.py
class MySpider(CrawlSpider):
    name = 'my_spider'
    ...

    def parse(self, response):
        x = HtmlXPathSelector(response)

        headings = x.select('//h2/text()').extract()
        for h in headings:
            item = MyDjangoItem()
            item['name'] = h
            yield item

        url = 'http://example.com/next'  # I have custom rules for constructing (not extracting) next url
        yield Request(url, callback=self.parse)