hbase regionservers未与master通信

时间:2016-07-04 09:10:03

标签: hadoop hbase

我正在尝试使用bhase集群。两个主服务器和两个区域服务器。我的问题是 regionserver抱怨告诉主人他们已经开始了。

2016-07-01 16:10:21,879 WARN  [regionserver/nbd-hadoop-data1/153.77.130.27:60020] **regionserver.HRegionServer: reportForDuty failed; sleeping and then retrying.**
2016-07-01 16:10:24,879 INFO  [regionserver/nbd-hadoop-data1/153.77.130.27:60020] **regionserver.HRegionServer: reportForDuty to master=0.0.0.0,60000,1467381897236 with port=60020, startcode=1467382178755**
2016-07-01 16:10:24,879 DEBUG [regionserver/nbd-hadoop-data1/153.77.130.27:60020] ipc.AbstractRpcClient: Use SIMPLE authentication for service RegionServerStatusService, sasl=false
2016-07-01 16:10:24,880 DEBUG [regionserver/nbd-hadoop-data1/153.77.130.27:60020] ipc.AbstractRpcClient: Connecting to /0.0.0.0:60000
2016-07-01 16:10:24,880 WARN  [regionserver/nbd-hadoop-data1/153.77.130.27:60020] regionserver.HRegionServer: error telling master we are up
com.google.protobuf.ServiceException: java.net.ConnectException: Connection refused
    at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:223)
    at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287)
    at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerStartup(RegionServerStatusProtos.java:8982)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2270)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:894)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)

奇怪的是它在0.0.0.0上打开了端口:

主服务器正在等待区域服务器

2016-07-01 16:08:43,495 INFO  [0.0.0.0:60000.activeMasterManager] master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 220970 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.

但是当我停止regionserver master(Zookeeper)识别出regionserver转为离线时:

2016-07-01 16:55:25,124 WARN  [main-EventThread] zookeeper.RegionServerTracker: nbd-hadoop-data1,60020,1467384161702 is not online or isn't known to the master.The latter could be caused by a DNS misconfiguration.
2016-07-01 16:55:26,509 INFO  [0.0.0.0:60000.activeMasterManager] master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 3023984 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.

我的hbase群集配置

153.77.130.29 nbd-hadoop-nn1 - zookeeper, hdfs, hbase master
153.77.130.30 nbd-hadoop-nn2 -zookeeper, hdfs, hbase master
153.77.130.22 nbd-service - zookeeper
153.77.130.27 nbd-hadoop-data1 hbase regionserver 1
153.77.130.28 nbd-hadoop-data2 hbase regionserver 2


所有机器都按以下方式设置**/etc/hosts**

127.0.0.1       localhost       localhost.localdomain localhost4 localhost4.localdomain4
::1     localhost       localhost.localdomain localhost6 localhost6.localdomain6

127.0.0.1       nbd-hadoop-nn1
153.77.130.22 nbd-service
153.77.130.29 nbd-hadoop-nn1
153.77.130.30 nbd-hadoop-nn2
153.77.130.27 nbd-hadoop-data1
153.77.130.28 nbd-hadoop-data2

主服务器bhase-site.xml

<property>
      <name>hbase.master.port</name>
      <value>60000</value>
    </property>

    <property>
      <name>hbase.regionserver.global.memstore.lowerLimit</name>
      <value>0.38</value>
    </property>

    <property>
      <name>hbase.regionserver.global.memstore.upperLimit</name>
      <value>0.4</value>
    </property>

    <property>
      <name>hbase.regionserver.handler.count</name>
      <value>60</value>
    </property>

    <property>
      <name>hbase.regionserver.info.port</name>
      <value>60030</value>
    </property>

 <property>
      <name>hbase.regionserver.port</name>
      <value>60020</value>
    </property>

区域服务器bhase-site.xml

 <property>
      <name>hbase.master.info.port</name>
      <value>60010</value>
    </property>

    <property>
      <name>hbase.master.port</name>
      <value>60000</value>
    </property>

    <property>
      <name>hbase.regionserver.global.memstore.lowerLimit</name>
      <value>0.38</value>
    </property>

    <property>
      <name>hbase.regionserver.global.memstore.upperLimit</name>
      <value>0.4</value>
    </property>

    <property>
      <name>hbase.regionserver.handler.count</name>
      <value>60</value>
    </property>
 <property>
      <name>hbase.regionserver.port</name>
      <value>60020</value>
    </property>

  <property>
      <name>hbase.regionserver.info.port</name>
      <value>60030</value>
    </property>
来自主服务器 netstat -ntlp

nbd-hadoop-nn1(在:::显示正确打开的端口60000):

tcp        0      0 :::60000                    :::*                        LISTEN      30839/java
来自区域服务器 netstat -ntlp

nbd-hadoop-data1显示端口60020已绑定到localhost。 我认为这是问题的根源:

tcp        0      0 ::ffff:127.0.0.1:60020      :::*                        LISTEN      22858/java

我无法在区域服务器的端口60020上从主服务器telnet nbd-hadoop-data1 60020 ** - 连接拒绝进行远程登录。 这可能是问题的根源,但我不知道如何重新配置​​它。我没有找到为什么区域服务器在::ffff:127.0.0.1:60020打开端口的原因。

非常感谢您的提示。如果您需要其他日志或配置文件,我将提供它。

2 个答案:

答案 0 :(得分:0)

问题解决了。问题是由我的/ etc / hosts文件127.0.01主机名中的环回引起的。

答案 1 :(得分:0)

我也遇到了确切的问题!

在您的情况下,条目 127.0.0.1 nbd-hadoop-nn1 解析为 localhost
显然 hbase/zookeeper 需要知道分布式模式下的实际 IP 地址。

我不知道 hbase 的内部结构,但是如果您删除此条目,它将像魅力一样工作!我有自己的 dns 服务器,因此指定主机名对我来说就足够了,我根本不需要使用 /etc/hosts 文件。事实上,我遇到了这个问题,因为集群中的所有机器在 /etc/hosts 文件中都有 127.0.0.1 localhost machine<n> 条目!所以感谢@miky,我确切地知道在哪里寻找解决这个问题的方法!我的机器配置设置了带有主机名条目的 /etc/hosts 文件,并且我最近在我的网络中引入了 dns 服务器,所以是时候放弃这种做法了!