为什么Scrapy管道中的项目处理不是并发的?

时间:2014-10-10 14:46:50

标签: python scrapy

如何使管道异步工作?我认为已经如此,因为CONCURRENT_ITEMS描述:

  

要处理的最大并发项数(每个响应)   项目处理器中的并行(也称为项目管道)。

这是我的管道:

class TestPipeline:
    def __init__(self):
        self.x = 0
    def process_item(self, item, spider):
        self.x += 1
        log.msg(str(self.x))
        sleep(2)
        return item

日志:

2014-10-10 17:34:55+0300 [scrapy] INFO: Enabled item pipelines: TestPipeline
2014-10-10 17:34:55+0300 [myspider] INFO: Spider opened
2014-10-10 17:34:55+0300 [myspider] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2014-10-10 17:34:56+0300 [scrapy] INFO: 1
2014-10-10 17:34:58+0300 [scrapy] INFO: 2
2014-10-10 17:35:00+0300 [scrapy] INFO: 3
2014-10-10 17:35:02+0300 [scrapy] INFO: 4
2014-10-10 17:35:04+0300 [scrapy] INFO: 5
2014-10-10 17:35:06+0300 [scrapy] INFO: 6
2014-10-10 17:35:08+0300 [scrapy] INFO: 7

我的蜘蛛:

class myspider(CrawlSpider):
    name = "myspider"
    allowed_domains = ["example.com"]
    start_urls = [
        "example.com"
    ]

    rules = (
    ...
    )


    def __init__(self, name=None, *args, **kwargs):
        super(myspider, self).__init__(*args, **kwargs)
        log.start()


    def parse_page(self, response):
        links = response.xpath(some xpath)

        for link in links:
            item = Item()
            try:
                (item['url'], item['filename']) = link.re(some regex)
            except ValueError:
                continue

            yield item

0 个答案:

没有答案