我在启动hdfs时遇到问题。我的集群由Google Cloud Platform上的3个节点组成。 1是活动的名称节点。节点2应该是备用名称节点和数据节点。节点3只是datanode。
当我启动hdfs时,我的活动namenode已启动并正在运行,但我的备用namenode却未启动,如果我尝试在node1:50070上打开UI,一切看起来都很好,但是如果我尝试node2:50070则无法连接(无法访问该站点)。在检查datanodes UI时,当我打开node2:50075时,我看到namenode 1处于RUNNING状态,而namenode 2处于CONNECTING状态(并且一直保持心跳数不断增加的状态)。
这很有趣,因为我们有另一个具有完全相同的hdfs配置的测试环境,但是那里没有问题。 出现错误的日志如下:
2019-03-05 14:45:02,481 ERROR org.apache.hadoop.hdfs.server.namenode.FSImage: Failed to load image from FSImageFile(file=/persist/hdfs/namenode/current/fsimage_0000000000000951913, cpktTxId=0000000000000951913)
java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:208)
at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:220)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:931)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:915)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:788)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:718)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:316)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1044)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:707)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:635)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:696)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:906)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:885)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1626)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1694)
2019-03-05 14:45:02,534 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write lock held for 12202 ms via
java.lang.Thread.getStackTrace(Thread.java:1559)
org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1033)
org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:252)
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1572)
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1073)
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:707)
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:635)
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:696)
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:906)
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:885)
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1626)
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1694)
Number of suppressed write-lock reports: 0
Longest write-lock held interval: 12202
2019-03-05 14:45:02,535 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception loading fsimage
java.io.IOException: Failed to load FSImage file, see error(s) above for more info.
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:732)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:316)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1044)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:707)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:635)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:696)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:906)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:885)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1626)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1694)
2019-03-05 14:45:02,555 INFO org.mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@core-acc2:50070
2019-03-05 14:45:02,563 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics system...
2019-03-05 14:45:02,564 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system stopped.
2019-03-05 14:45:02,565 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system shutdown complete.
2019-03-05 14:45:02,565 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
java.io.IOException: Failed to load FSImage file, see error(s) above for more info.
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:732)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:316)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1044)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:707)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:635)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:696)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:906)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:885)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1626)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1694)
2019-03-05 14:45:02,568 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2019-03-05 14:45:02,595 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at
(复制粘贴时我将日志截断了。)
如您所见,错误很明显,名称节点2无法加载fsimage。但是,我不确定原因是否仅与fsimage本身有关,还是还有更多原因,因为我看到了EOF异常和以前从未见过的写锁保持间隔消息。
我发现/尝试过的内容:
我在某处读到这可能是由于超时或java产生的多个线程引起的,或者诸如此类,您有什么建议吗?
谢谢。