我有2个HBase集群,有时集群之间断开连接会停止复制。我看到了与zookeeper客户端相关的错误。我将hbase conf中的zookeeper客户端配置从3增加到50.我测试了它。 " zookeeper.recovery.retry",50
这不起作用导致连接断开后重试失败。
我们再次使复制工作的唯一方法是重启区域服务器。
这似乎是HBase复制中的一个错误。有没有办法告诉regionserver重新启动复制zookeeper客户端(完成重试时)或其他解决方案?
错误日志示例:
g.apache.zookeeper.KeeperException $ SessionExpiredException:KeeperErrorCode = / hbase / rs的会话已过期 在org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) 在org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1468) 在org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getChildren(RecoverableZooKeeper.java:294) 在org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:516) at org.apache.hadoop.hbase.replication.ReplicationPeersZKImpl.fetchSlavesAddresses(ReplicationPeersZKImpl.java:446) at org.apache.hadoop.hbase.replication.ReplicationPeersZKImpl.getRegionServersOfConnectedPeer(ReplicationPeersZKImpl.java:306) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager.chooseSinks(ReplicationSinkManager.java:146) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager.reportBadSink(ReplicationSinkManager.java:140) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:791) 在org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:388)