有人可以检查下面的代码是否正确吗? 代码见于 http://readthedocs.org/docs/scrapy/en/0.14/topics/exporters.html
我认为这是不正确的,因为:
感谢您的帮助。
class XmlExportPipeline(object):
def __init__(self):
dispatcher.connect(self.spider_opened, signals.spider_opened)
dispatcher.connect(self.spider_closed, signals.spider_closed)
self.files = {}
def spider_opened(self, spider):
file = open('%s_products.xml' % spider.name, 'w+b')
self.files[spider] = file
self.exporter = XmlItemExporter(file)
self.exporter.start_exporting()
def spider_closed(self, spider):
self.exporter.finish_exporting()
file = self.files.pop(spider)
file.close()
def process_item(self, item, spider):
self.exporter.export_item(item)
return item
答案 0 :(得分:1)
我认为应该在scrapy-users group中提出这个问题。
AFAIK,因为v0.14 Scrapy不支持在一个进程中使用多个蜘蛛(related discussion),所以这段代码可以正常工作。多个蜘蛛的明显修复方法是使用exporters
键进行spider
dict:
class XmlExportPipeline(object):
def __init__(self):
dispatcher.connect(self.spider_opened, signals.spider_opened)
dispatcher.connect(self.spider_closed, signals.spider_closed)
self.files = {}
self.exporters = {}
def spider_opened(self, spider):
file = open('%s_products.xml' % spider.name, 'w+b')
self.files[spider] = file
self.exporters[spider] = XmlItemExporter(file)
self.exporters[spider].start_exporting()
def spider_closed(self, spider):
self.exporters[spider].finish_exporting()
file = self.files.pop(spider)
file.close()
def process_item(self, item, spider):
self.exporters[spider].export_item(item)
return item