CrawlSpider派生的对象是没有属性' state'

时间:2014-11-20 22:04:23

标签: python web-scraping scrapy scrapy-spider

我尝试使用http://doc.scrapy.org/en/0.22/topics/jobs.html中描述的spider.state,但是我收到了错误

MyCrawlSpider has no attribute 'state'

我尝试在CrawlSpider派生类的 init ()函数中使用它。这可能是问题吗?

class MyCrawlSpider(CrawlSpider):
    crawl_start = datetime.utcnow().isoformat()

    def __init__(self, *args, **kwargs):
        super(MyCrawlSpider, self).__init__(*args, **kwargs)

        if self.state.get('crawl_start'):
            crawl_start = self.state.get('crawl_start')
        else:
            self.state["crawl_start"] = crawl_start

我的目标是让crawl_start属性始终位于我的抓取工具首先启动的isoformat datetime字符串上,与x恢复启动时无关

1 个答案:

答案 0 :(得分:2)

根据source codestate处理程序中的scrapy.contrib.spiderstate.SpiderState extension在蜘蛛上设置了class SpiderState(object): """Store and load spider state during a scraping job""" ... def spider_closed(self, spider): if self.jobdir: with open(self.statefn, 'wb') as f: pickle.dump(spider.state, f, protocol=2) def spider_opened(self, spider): if self.jobdir and os.path.exists(self.statefn): with open(self.statefn, 'rb') as f: spider.state = pickle.load(f) else: spider.state = {} 属性:

__init__()

信号的发送晚于正在执行的state方法 - 蜘蛛实例上还没有{{1}}属性 - 这就是您收到错误的原因。