Question

我的脚本关闭后，我需要运行一个脚本。我看到Scrapy有一个名为spider_closed（）的处理程序，但我不了解的是如何将其合并到脚本中。我想要做的是，一旦抓取工具完成抓取，我想将所有的csv文件合并到一起，并将它们加载到工作表中。如果有人能做到这一点，那就太好了。

Answer 1

按照documentation中的示例，您将以下内容添加到Spider：

# This function remains as-is.
@classmethod
def from_crawler(cls, crawler, *args, **kwargs):
    spider = super().from_crawler(crawler, *args, **kwargs)
    crawler.signals.connect(spider.spider_closed, signal=signals.spider_closed)
    return spider

# This is where you do your CSV combination.
def spider_closed(self, spider):
    # Whatever is here will run when the spider is done.
    combine_csv_to_sheet()

Answer 2

根据我对other answer about a signal-based solution的评论，这是在完成多个蜘蛛程序后运行一些代码的一种方法。这不涉及使用spider_closed信号。

from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings


process = CrawlerProcess(get_project_settings())
process.crawl('spider1')
process.crawl('spider2')
process.crawl('spider3')
process.crawl('spider4')
process.start()

# CSV combination code goes here. It will only run when all the spiders are done.
# ...

cra蜘蛛关闭

2 个答案: