HBase与ZooKeeper建立会话并立即关闭会话

时间:2014-09-18 08:52:40

标签: session hbase apache-zookeeper reconnect

我发现我们的RegionServers经常连接到ZooKeeper。他们似乎不断建立会话,关闭它并重新连接ZooKeeper。这是服务器端和客户端的日志。我不知道为什么会这样,以及如何处理它?我们正在使用HBase 0.94.11和ZooKeeper 3.4.4。

HBase RegionServer的日志:

2014-09-18,16:38:17,867 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=10.2.201.74:11000,10.2.201.73:11000,10.101.10.67:11000,10.101.10.66:11000,10.2.201.75:11000 sessionTimeout=30000 watcher=catalogtracker-on-org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@69d892a1
2014-09-18,16:38:17,868 INFO org.apache.zookeeper.client.ZooKeeperSaslClient: Client will use GSSAPI as SASL mechanism.
2014-09-18,16:38:17,868 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server lg-hadoop-srv-ct01.bj/10.2.201.73:11000. Will attempt to SASL-authenticate using Login Context section 'Client'
2014-09-18,16:38:17,868 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of this process is 11787@lg-hadoop-srv-st05.bj
2014-09-18,16:38:17,868 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to lg-hadoop-srv-ct01.bj/10.2.201.73:11000, initiating session
2014-09-18,16:38:17,870 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server lg-hadoop-srv-ct01.bj/10.2.201.73:11000, sessionid = 0x248782700e52b3c, negotiated timeout = 30000
2014-09-18,16:38:17,876 INFO org.apache.zookeeper.ZooKeeper: Session: 0x248782700e52b3c closed
2014-09-18,16:38:17,876 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2014-09-18,16:38:17,878 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Total replicated: 24

来自ZooKeeper服务器的日志:

2014-09-18,16:38:17,869 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: [myid:2] Accepted socket connection from /10.2.201.76:55621
2014-09-18,16:38:17,869 INFO org.apache.zookeeper.server.ZooKeeperServer: [myid:2] Client attempting to establish new session at /10.2.201.76:55621
2014-09-18,16:38:17,870 INFO org.apache.zookeeper.server.ZooKeeperServer: [myid:2] Established session 0x248782700e52b3c with negotiated timeout 30000 for client /10.2.201.76:55621
2014-09-18,16:38:17,872 INFO org.apache.zookeeper.server.auth.SaslServerCallbackHandler: [myid:2] Successfully authenticated client: authenticationID=hbase_srv/hadoop@XIAOMI.HADOOP;  authorizationID=hbase_srv/hadoop@XIAOMI.HADOOP.
2014-09-18,16:38:17,872 INFO org.apache.zookeeper.server.auth.SaslServerCallbackHandler: [myid:2] Setting authorizedID: hbase_srv
2014-09-18,16:38:17,872 INFO org.apache.zookeeper.server.ZooKeeperServer: [myid:2] adding SASL authorization for authorizationID: hbase_srv
2014-09-18,16:38:17,877 INFO org.apache.zookeeper.server.NIOServerCnxn: [myid:2] Closed socket connection for client /10.2.201.76:55621 which had sessionid 0x248782700e52b3c

1 个答案:

答案 0 :(得分:0)

最后我找到了根本原因。

是的,它是关于ReplicationSink的,我找到了日志,“2014-09-23,14:58:01,736 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:复制表online_miliao_recent”。

然后我查看相关代码,发现每次调用replicateEntries()时,它都会调用sharedHBaseAdmin.tableExists(table)。

sharedHBaseAdmin.tableExists()将创建一个新的CatalogTracker对象,它也是一个ZooKeeper客户端。

当此方法退出时,它将清理ZooKeeper客户端和会话。

因此,这个日志看起来很合理,因为Replication正在运行。但是tableExists()有点沉重,我认为每次复制enties都不应该调用它。我还注意到在0.94.11之后CatalogTracker不在ReplicationSink中,所以对于更高版本来说这不是问题。

如果我找到了从ReplicationSink中删除CatalogTracker的jira,那将会很棒: - )