我正在尝试使用bhase集群。两个主服务器和两个区域服务器。我的问题是 regionserver抱怨告诉主人他们已经开始了。:
2016-07-01 16:10:21,879 WARN [regionserver/nbd-hadoop-data1/153.77.130.27:60020] **regionserver.HRegionServer: reportForDuty failed; sleeping and then retrying.**
2016-07-01 16:10:24,879 INFO [regionserver/nbd-hadoop-data1/153.77.130.27:60020] **regionserver.HRegionServer: reportForDuty to master=0.0.0.0,60000,1467381897236 with port=60020, startcode=1467382178755**
2016-07-01 16:10:24,879 DEBUG [regionserver/nbd-hadoop-data1/153.77.130.27:60020] ipc.AbstractRpcClient: Use SIMPLE authentication for service RegionServerStatusService, sasl=false
2016-07-01 16:10:24,880 DEBUG [regionserver/nbd-hadoop-data1/153.77.130.27:60020] ipc.AbstractRpcClient: Connecting to /0.0.0.0:60000
2016-07-01 16:10:24,880 WARN [regionserver/nbd-hadoop-data1/153.77.130.27:60020] regionserver.HRegionServer: error telling master we are up
com.google.protobuf.ServiceException: java.net.ConnectException: Connection refused
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:223)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287)
at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerStartup(RegionServerStatusProtos.java:8982)
at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2270)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:894)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
奇怪的是它在0.0.0.0上打开了端口:
主服务器正在等待区域服务器:
2016-07-01 16:08:43,495 INFO [0.0.0.0:60000.activeMasterManager] master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 220970 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
但是当我停止regionserver master(Zookeeper)识别出regionserver转为离线时:
2016-07-01 16:55:25,124 WARN [main-EventThread] zookeeper.RegionServerTracker: nbd-hadoop-data1,60020,1467384161702 is not online or isn't known to the master.The latter could be caused by a DNS misconfiguration.
2016-07-01 16:55:26,509 INFO [0.0.0.0:60000.activeMasterManager] master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 3023984 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
我的hbase群集配置
153.77.130.29 nbd-hadoop-nn1 - zookeeper, hdfs, hbase master
153.77.130.30 nbd-hadoop-nn2 -zookeeper, hdfs, hbase master
153.77.130.22 nbd-service - zookeeper
153.77.130.27 nbd-hadoop-data1 hbase regionserver 1
153.77.130.28 nbd-hadoop-data2 hbase regionserver 2
所有机器都按以下方式设置**/etc/hosts**
:
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
127.0.0.1 nbd-hadoop-nn1
153.77.130.22 nbd-service
153.77.130.29 nbd-hadoop-nn1
153.77.130.30 nbd-hadoop-nn2
153.77.130.27 nbd-hadoop-data1
153.77.130.28 nbd-hadoop-data2
主服务器bhase-site.xml
:
<property>
<name>hbase.master.port</name>
<value>60000</value>
</property>
<property>
<name>hbase.regionserver.global.memstore.lowerLimit</name>
<value>0.38</value>
</property>
<property>
<name>hbase.regionserver.global.memstore.upperLimit</name>
<value>0.4</value>
</property>
<property>
<name>hbase.regionserver.handler.count</name>
<value>60</value>
</property>
<property>
<name>hbase.regionserver.info.port</name>
<value>60030</value>
</property>
<property>
<name>hbase.regionserver.port</name>
<value>60020</value>
</property>
区域服务器bhase-site.xml
:
<property>
<name>hbase.master.info.port</name>
<value>60010</value>
</property>
<property>
<name>hbase.master.port</name>
<value>60000</value>
</property>
<property>
<name>hbase.regionserver.global.memstore.lowerLimit</name>
<value>0.38</value>
</property>
<property>
<name>hbase.regionserver.global.memstore.upperLimit</name>
<value>0.4</value>
</property>
<property>
<name>hbase.regionserver.handler.count</name>
<value>60</value>
</property>
<property>
<name>hbase.regionserver.port</name>
<value>60020</value>
</property>
<property>
<name>hbase.regionserver.info.port</name>
<value>60030</value>
</property>
来自主服务器 netstat -ntlp
的 nbd-hadoop-nn1
(在:::显示正确打开的端口60000):
tcp 0 0 :::60000 :::* LISTEN 30839/java
来自区域服务器 netstat -ntlp
的 nbd-hadoop-data1
显示端口60020已绑定到localhost。
我认为这是问题的根源:
tcp 0 0 ::ffff:127.0.0.1:60020 :::* LISTEN 22858/java
我无法在区域服务器的端口60020上从主服务器telnet nbd-hadoop-data1 60020
** - 连接拒绝进行远程登录。
这可能是问题的根源,但我不知道如何重新配置它。我没有找到为什么区域服务器在::ffff:127.0.0.1:60020
打开端口的原因。
非常感谢您的提示。如果您需要其他日志或配置文件,我将提供它。
答案 0 :(得分:0)
问题解决了。问题是由我的/ etc / hosts文件127.0.01主机名中的环回引起的。
答案 1 :(得分:0)
我也遇到了确切的问题!
在您的情况下,条目 127.0.0.1 nbd-hadoop-nn1
解析为 localhost
。
显然 hbase/zookeeper 需要知道分布式模式下的实际 IP 地址。
我不知道 hbase 的内部结构,但是如果您删除此条目,它将像魅力一样工作!我有自己的 dns 服务器,因此指定主机名对我来说就足够了,我根本不需要使用 /etc/hosts 文件。事实上,我遇到了这个问题,因为集群中的所有机器在 /etc/hosts 文件中都有 127.0.0.1 localhost machine<n>
条目!所以感谢@miky,我确切地知道在哪里寻找解决这个问题的方法!我的机器配置设置了带有主机名条目的 /etc/hosts 文件,并且我最近在我的网络中引入了 dns 服务器,所以是时候放弃这种做法了!