据我所知,在主节点中我们有名称节点,它在两个文件中维护元数据。一个是FSImage,另一个是编辑日志 因此,当hadoop系统启动时,此FSImage最初被加载,并且此FSImage包含存储的集群和数据的目录结构。然后,对于每个发生的事务,编辑日志文件都会更新。
我的问题如下:
答案 0 :(得分:2)
要理解这一点,我们必须在Hadoop运行时逐步详细介绍
加载FSImage后的Namenode拥有数据存储在内存中的整个快照。
交易进入,信息存储在编辑日志中。
定期地,每个默认值每小时,检查点节点/辅助名称节点检索日志,并将它们与最新的fsimage合并,并将数据保存为检查点。 此时,nn在内存中有图像,清空编辑日志,最新的检查点作为图像存储在snn / cn上。
回答你的问题。
是的,只有两个文件
SNN / CN上的fsimage将定期更新。导入检查点时,将更新NN上的fsimage。这应该至少在重启时发生。
将editlog合并到fsimage是一项代价高昂的操作。它需要在namenode中进入安全模式才能合并数据。在这样的环境中这是不可能的
删除是一个日志,也是一个写入,因此它存储在编辑日志中
答案 1 :(得分:0)
1) Yes only these two files are there .
2) This is true for name node .
3) It is copied to secondary name node for persistent storage , things would work fine un till name node is up ,lets say you have done so many changes like creating directories ,files ,putting the data to hdfs and so on so during run time this information is directly loaded into the memory but what if namenode goes down so what ever new meta information was there which is not embedded current fsimage ,it would get lost permanently because when ever your system would come up it would load the fsimage into memory since its the old fsimage it won't have new changes . With this secondary name node we are preserving this changes in edit.log and finally edit.log file used for fsimage and new fsimage can be replaced with old one .
4) process is when ever meta data gets changes ,this event gets written in edit.log file and after some specified interval these logs copied to secondary name node when their size gets too big then edit.log information is flushed into the form of fsimage.
current fsimage would not get updated with addition or deletion of file ,these changes will directly cater in memory.
答案 2 :(得分:0)
是的,这些是包含群集文件系统信息的唯一两个文件
没有。每次重新启动Name节点时,FSImage都会写入磁盘,并且每个checkpoin SNN都会将FSImage写入磁盘
在繁忙的群集上,EditLog会快速增长。如果编辑日志非常大,那么下次重新启动NN将花费更长的时间。 SNN将定期合并EDITlog和FSImage。如果您的NN磁盘发生故障,SNN也将作为FSImage的备份。
是。 FSImage将在主内存中更新,而不是在磁盘中。当EDITlog将使用新事务在磁盘上更新