我有一个以文件为参数的蜘蛛,此文件包含xpath。
蜘蛛解析文件并获取xpath并开始抓取。
一切都运转良好
现在,我想多次运行那只蜘蛛,所以我这样做了:
script.py
def setup_crawler(file):
spider = MySpider(attributesXMLFilePath=file)
settings = get_project_settings()
crawler = Crawler(settings)
crawler.configure()
crawler.crawl(spider)
crawler.start()
for oneFile in myFiles:
setup_crawler(oneFile')
log.start()
reactor.run()
并在MySpider
我这样做:
def __init__(self, attributesXMLFilePath):
dispatcher.connect(self.spider_closed, signals.spider_closed)
def spider_closed(self, spider):
log.msg('The number of pages in the spider {1} are {0}'.format(self.numbers, self.attributesXMLFilePath))
log.msg('The number of details pages in the spider {1} are {0}'.format(self.numbers2, self.attributesXMLFilePath))
log.msg('The spider {0} with xml {2} finished working on {1}'.format(self.name, datetime.now(), self.attributesXMLFilePath), level=log.INFO)
但是在日志文件中,我看到了:
2014-06-08 18:18:03+0300 [scrapy] INFO: The number of pages in the spider file1.xml are 1
2014-06-08 18:18:03+0300 [scrapy] INFO: The number of pages in the spider file1.xml are 1
2014-06-08 18:18:03+0300 [scrapy] INFO: The number of details pages in the spider file1.xml are 0
2014-06-08 18:18:03+0300 [scrapy] INFO: The number of details pages in the spider file1.xml are 0
2014-06-08 18:18:03+0300 [scrapy] INFO: The spider MySpider with xml file1.xml finished working on 2014-06-08 18:18:03.746000
2014-06-08 18:18:03+0300 [scrapy] INFO: The spider MySpider with xml file1.xml finished working on 2014-06-08 18:18:03.746000
2014-06-08 18:18:03+0300 [scrapy] INFO: The number of pages in the spider file2.xml are 1
2014-06-08 18:18:03+0300 [scrapy] INFO: The number of pages in the spider file2.xml are 1
2014-06-08 18:18:03+0300 [scrapy] INFO: The number of details pages in the spider file2.xml are 0
2014-06-08 18:18:03+0300 [scrapy] INFO: The number of details pages in the spider file2.xml are 0
2014-06-08 18:18:03+0300 [scrapy] INFO: The spider MySpider with xml file2.xml finished working on 2014-06-08 18:18:03.748000
2014-06-08 18:18:03+0300 [scrapy] INFO: The spider MySpider with xml file2.xml finished working on 2014-06-08 18:18:03.748000
如你所见:
我的日志文件中的日志数据很多,我只是向您展示了解释我的问题的示例
Ofc我没有使用MySpider
,file1.xml
和file2.xml
名称,但我无法向您显示隐私问题的真实姓名。