通过终止整个服务器,HA-Namenode故障转移失败

时间:2016-09-27 12:35:42

标签: hadoop hdfs apache-zookeeper high-availability failover

我正在尝试建立一个像Apache-Doku解释的HA-Hdfs集群。 https://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html

当我杀死活动名称服务器的Prozess时,故障转移工作正常,但是当我导致更大的故障,如拔出网络时,备用名称节点不会激活。 备用节点尝试将活动节点设置为备用节点,但是当机器死机时,建立SSH连接将很困难。 有什么我没见过的吗?

这就是hadoop-hduser-zkfc-nn1.log所说的:

2016-09-27 15:03:11,316 INFO org.apache.hadoop.ha.NodeFencer: ====== Beginning Service Fencing Process... ======
2016-09-27 15:03:20,316 INFO org.apache.hadoop.ha.NodeFencer: Trying method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
2016-09-27 15:03:20,317 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Connecting to nn2.example.org...
2016-09-27 15:03:20,317 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connecting to nn2.example.org port 22
2016-09-27 15:03:23,315 WARN org.apache.hadoop.ha.SshFenceByTcpPort: Unable to connect to nn2.example.org as user hduser
com.jcraft.jsch.JSchException: java.net.NoRouteToHostException: No route to host
        at com.jcraft.jsch.Util.createSocket(Util.java:386)
        at com.jcraft.jsch.Session.connect(Session.java:182)
        at org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
        at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
        at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:532)
        at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:505)
        at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:61)
        at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:892)
        at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:910)
        at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:809)
        at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:418)
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
2016-09-27 15:03:23,315 WARN org.apache.hadoop.ha.NodeFencer: Fencing method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
2016-09-27 15:03:23,315 ERROR org.apache.hadoop.ha.NodeFencer: Unable to fence service by any configured method.
2016-09-27 15:03:23,315 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election
java.lang.RuntimeException: Unable to fence NameNode at nn2.example.org/172.16.1.188:8040
        at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:533)
        at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:505)
        at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:61)
        at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:892)
        at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:910)
        at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:809)
        at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:418)
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
2016-09-27 15:03:23,316 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session
2016-09-27 15:03:23,325 INFO org.apache.zookeeper.ZooKeeper: Session: 0x3576b5e84d4003a closed
2016-09-27 15:03:24,329 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=nn1.example.org:2181,nn2.example.org:2181,dn1.example.org:2181,dn2.example.org:2181,dn3.example.org:2181 sessionTimeout=5000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@3bf4eda6
2016-09-27 15:03:24,335 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server nn1.example.org/172.16.1.187:2181. Will not attempt to authenticate using SASL (unknown error)
2016-09-27 15:03:24,342 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to nn1.example.org/172.16.1.187:2181, initiating session
2016-09-27 15:03:24,379 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server nn1.example.org/172.16.1.187:2181, sessionid = 0x1576b95292f0001, negotiated timeout = 5000
2016-09-27 15:03:24,386 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2016-09-27 15:03:24,403 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session connected.
2016-09-27 15:03:24,407 INFO org.apache.hadoop.ha.ActiveStandbyElector: Checking for any old active which needs to be fenced...
2016-09-27 15:03:24,425 INFO org.apache.hadoop.ha.ActiveStandbyElector: Old node exists: 0a0a68612d636c757374657212036e6e321a1668646d617374657230322e7265696368656c742e646520e83e28d33e
2016-09-27 15:03:24,444 INFO org.apache.hadoop.ha.ZKFailoverController: Should fence: NameNode at nn2.example.org/172.16.1.188:8040
2016-09-27 15:03:27,315 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: nn2.example.org/172.16.1.188:8040. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
2016-09-27 15:03:29,315 WARN org.apache.hadoop.ha.FailoverController: Unable to gracefully make NameNode at nn2.example.org/172.16.1.188:8040 standby (unable to connect)
java.net.NoRouteToHostException: No Route to Host from  nn1.example.org/172.16.1.187 to nn2.example.org:8040 failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For more details see:  http://wiki.apache.org/hadoop/NoRouteToHost
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:758)
        at org.apache.hadoop.ipc.Client.call(Client.java:1479)
        at org.apache.hadoop.ipc.Client.call(Client.java:1412)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
        at com.sun.proxy.$Proxy9.transitionToStandby(Unknown Source)
        at org.apache.hadoop.ha.protocolPB.HAServiceProtocolClientSideTranslatorPB.transitionToStandby(HAServiceProtocolClientSideTranslatorPB.java:112)
        at org.apache.hadoop.ha.FailoverController.tryGracefulFence(FailoverController.java:172)
        at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:514)
        at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:505)
        at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:61)
        at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:892)
        at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:910)
        at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:809)
        at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:418)
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
Caused by: java.net.NoRouteToHostException: No route to host
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
        at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
        at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
        at org.apache.hadoop.ipc.Client.call(Client.java:1451)
        ... 14 more

0 个答案:

没有答案