如何使管道异步工作?我认为已经如此,因为CONCURRENT_ITEMS描述:
要处理的最大并发项数(每个响应) 项目处理器中的并行(也称为项目管道)。
这是我的管道:
class TestPipeline:
def __init__(self):
self.x = 0
def process_item(self, item, spider):
self.x += 1
log.msg(str(self.x))
sleep(2)
return item
日志:
2014-10-10 17:34:55+0300 [scrapy] INFO: Enabled item pipelines: TestPipeline
2014-10-10 17:34:55+0300 [myspider] INFO: Spider opened
2014-10-10 17:34:55+0300 [myspider] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2014-10-10 17:34:56+0300 [scrapy] INFO: 1
2014-10-10 17:34:58+0300 [scrapy] INFO: 2
2014-10-10 17:35:00+0300 [scrapy] INFO: 3
2014-10-10 17:35:02+0300 [scrapy] INFO: 4
2014-10-10 17:35:04+0300 [scrapy] INFO: 5
2014-10-10 17:35:06+0300 [scrapy] INFO: 6
2014-10-10 17:35:08+0300 [scrapy] INFO: 7
我的蜘蛛:
class myspider(CrawlSpider):
name = "myspider"
allowed_domains = ["example.com"]
start_urls = [
"example.com"
]
rules = (
...
)
def __init__(self, name=None, *args, **kwargs):
super(myspider, self).__init__(*args, **kwargs)
log.start()
def parse_page(self, response):
links = response.xpath(some xpath)
for link in links:
item = Item()
try:
(item['url'], item['filename']) = link.re(some regex)
except ValueError:
continue
yield item