如何在服务器崩溃后从java.io.EOFException恢复Zookeeper?

时间:2017-05-27 13:51:29

标签: apache-zookeeper

如何从服务器崩溃后开始发生的以下错误中恢复? Zookeeper无法启动,并且日志上会反复显示以下消息。

2017-05-27 01:02:08,072 [myid:] - INFO [main:Environment@100] - Server environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib 
2017-05-27 01:02:08,072 [myid:] - INFO [main:Environment@100] - Server environment:java.io.tmpdir=/tmp 
2017-05-27 01:02:08,072 [myid:] - INFO [main:Environment@100] - Server environment:java.compiler=<NA> 
2017-05-27 01:02:08,072 [myid:] - INFO [main:Environment@100] - Server environment:os.name=Linux 
2017-05-27 01:02:08,072 [myid:] - INFO [main:Environment@100] - Server environment:os.arch=amd64 
2017-05-27 01:02:08,073 [myid:] - INFO [main:Environment@100] - Server environment:os.version=3.10.0-514.16.1.el7.x86_64 
2017-05-27 01:02:08,073 [myid:] - INFO [main:Environment@100] - Server environment:user.name=zookeeper 
2017-05-27 01:02:08,073 [myid:] - INFO [main:Environment@100] - Server environment:user.home=/opt/zookeeper 
2017-05-27 01:02:08,073 [myid:] - INFO [main:Environment@100] - Server environment:user.dir=/ 
2017-05-27 01:02:08,074 [myid:] - INFO [main:ZooKeeperServer@829] - tickTime set to 2000 
2017-05-27 01:02:08,074 [myid:] - INFO [main:ZooKeeperServer@838] - minSessionTimeout set to -1 
2017-05-27 01:02:08,074 [myid:] - INFO [main:ZooKeeperServer@847] - maxSessionTimeout set to -1 
2017-05-27 01:02:08,080 [myid:] - INFO [main:NIOServerCnxnFactory@89] - binding to port 0.0.0.0/0.0.0.0:2181 
2017-05-27 01:02:08,385 [myid:] - ERROR [main:Util@239] - Last transaction was partial. 
2017-05-27 01:02:08,400 [myid:] - ERROR [main:Util@239] - Last transaction was partial. 
2017-05-27 01:02:08,403 [myid:] - ERROR [main:Util@239] - Last transaction was partial. 
2017-05-27 01:02:08,403 [myid:] - ERROR [main:Util@239] - Last transaction was partial. 
2017-05-27 01:02:08,404 [myid:] - ERROR [main:Util@239] - Last transaction was partial. 
2017-05-27 01:02:08,404 [myid:] - ERROR [main:ZooKeeperServerMain@64] - Unexpected exception, exiting abnormally 
java.io.EOFException 
at java.io.DataInputStream.readInt(DataInputStream.java:392) 
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) 
at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) 
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:585) 
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:604) 
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:570) 
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:652) 
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:166) 
at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) 
at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:283) 
at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:410) 
at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:118) 
at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:119) 
at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:87) 
at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:53) 
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) 
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)

感谢 IPVP

3 个答案:

答案 0 :(得分:15)

我的解决方案是在/ hadoop / zookeeper / version-2(或dataDir所在的位置)找到0长度的日志文件并删除它。 之后启动ZooKeeper。

答案 1 :(得分:10)

我的解决方案是找到最后日志文件(长度为0字节)

您可以在version-2目录

中找到它
ls -l -r --sort=time

-rw-r--r-- 1 chris chris  67108880 Jan 24 10:37 log.23c6a70
-rw-r--r-- 1 chris chris         0 Jan 24 10:37 log.23d3fb4

我首先尝试删除了快照和最后2个日志文件,这些日志文件也正在运行,但是你的版本是&#34;有点&#34;老。

-rw-r--r-- 1 chris chris  3685904 Jan 24 00:56 snapshot.23c6a6e

也许您必须同时删除最后一个快照文件和最后一个日志文件以及0长度日志文件才能安全。

顺便说一句。日志文件和快照具有相同的HEX模式,必须匹配

日志。的 23c6a 70

快照。的 23c6a 6E

他们必须匹配并保持一致,你应该修复这个问题。

答案 2 :(得分:5)

看起来您遇到了一个已知的Apache ZooKeeper错误。有一些与此相关的Apache JIRA问题:ZOOKEEPER-1621ZOOKEEPER-2332。如果您对根本原因分析和一些可能的修复建议感兴趣,请参阅这些问题中的评论。

不幸的是,目前还没有包含修复bug的Apache ZooKeeper版本。您可以尝试一些可能的解决方法:

  1. 创建您自己的ZooKeeper本地版本,其中一个补丁附加到应用的链接JIRA问题。请注意,ZooKeeper社区尚未接受这些补丁,因此使用风险自负。
  2. 删除违规日志文件。问题的根本原因是以前运行的ZooKeeper的日志文件是用不完整的标头编写的。由于标头位于文件的开头,并且标头本身不完整,我们可以假设在该点之后日志文件中没有事务数据。因此,删除应该是安全的,不会造成任何数据丢失。
  3. 如果它更容易,您可以考虑重新格式化此ZooKeeper群集。如果ZooKeeper安装中的所有数据都是短暂的并且不需要长期持久性,这可能是一个合适的解决方案。