抓取所有链接完成后如何调用方法?

时间:2018-10-19 14:43:54

标签: python scrapy web-crawler

我正在使用Scrapy爬网网站,并希望将特定链接写入文件。我创建了一组链接,我想编写它们并将它们存储在类中的变量中。抓取完成后如何才能运行“ write_to_file”方法?

class MainSpider(CrawlSpider):
name = 'spiderName'
allowed_domains = [DOMAIN_NAME]
start_urls = [STARTING_URL]
product_links = set()
rules = (
    # call parse_link on all links from starting url
    Rule(LinkExtractor(), callback='parse_link', follow=True),)
print("product link size is " + str(len(product_links)))
write_to_file(name, product_links)

1 个答案:

答案 0 :(得分:1)

您可以通过dispatcher注册信号监听器。

我会尝试:

from scrapy import signals
from scrapy.xlib.pydispatch import dispatcher

class MySpider(CrawlSpider):
    def __init__(self):
        dispatcher.connect(self.spider_closed, signals.spider_closed)

    def spider_closed(self, spider):
      # second param is instance of spider that is about to be closed.