我正在使用Scrapy爬网网站,并希望将特定链接写入文件。我创建了一组链接,我想编写它们并将它们存储在类中的变量中。抓取完成后如何才能运行“ write_to_file”方法?
class MainSpider(CrawlSpider):
name = 'spiderName'
allowed_domains = [DOMAIN_NAME]
start_urls = [STARTING_URL]
product_links = set()
rules = (
# call parse_link on all links from starting url
Rule(LinkExtractor(), callback='parse_link', follow=True),)
print("product link size is " + str(len(product_links)))
write_to_file(name, product_links)
答案 0 :(得分:1)
您可以通过dispatcher
注册信号监听器。
我会尝试:
from scrapy import signals
from scrapy.xlib.pydispatch import dispatcher
class MySpider(CrawlSpider):
def __init__(self):
dispatcher.connect(self.spider_closed, signals.spider_closed)
def spider_closed(self, spider):
# second param is instance of spider that is about to be closed.