Hbase MasterNotRunningException虽然Hmaster,regionserver和Zookeeper已启动

时间:2013-05-18 07:25:18

标签: java hadoop hbase hdfs apache-zookeeper

我已经启动了hbase并且所有守护进程都在运行。

 $ jps
8482 HQuorumPeer
25105 RemoteMavenServer
9133 SecondaryNameNode
11883 HRegionServer
13793 Jps
8545 NameNode
8572 HMaster
11519 Main
25029 Main
8851 DataNode
9435 RunJar

现在让我们尝试列出表格:

hbase(main):004:0* list
        TABLE                                                                                                                                                   

ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried 7 times

Here is some help for this command:
List all tables in hbase. Optional regular expression parameter could
be used to filter the output. Examples:

主日志的尾巴:

2013-05-17 22:48:35,609 INFO org.apache.hadoop.hbase.master.ServerManager: Registering server=localhost,60020,1368856115352

Zookeeper日志的尾部:

$ tail *zoo*.log
2013-05-18 00:14:27,651 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /127.0.0.1:49826
2013-05-18 00:14:27,652 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /127.0.0.1:49826
2013-05-18 00:14:27,666 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x13eb59ceb22001e with negotiated timeout 180000 for client /127.0.0.1:49826

区域服务器日志的尾巴:

2013-05-18 00:08:35,416 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=2.03 MB, free=244.85 MB, max=246.88 MB, blocks=0, accesses=0, hits=0, hitRatio=0cachingAccesses=0, cachingHits=0, cachingHitsRatio=0evictions=0, evicted=0, evictedPerRun=NaN
2013-05-18 00:13:35,416 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=2.03 MB, free=244.85 MB, max=246.88 MB, blocks=0, accesses=0, hits=0, hitRatio=0cachingAccesses=0, cachingHits=0, cachingHitsRatio=0evictions=0, evicted=0, evictedPerRun=NaN
2013-05-18 00:18:35,416 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=2.03 MB, free=244.85 MB, max=246.88 MB, blocks=0, accesses=0, hits=0, hitRatio=0cachingAccesses=0, cachingHits=0, cachingHitsRatio=0evictions=0, evicted=0, evictedPerRun=NaN

更多细节(回应下面的@roman)。安全模式已经关闭。

fsck给出:

hadoop fsck /

.Status: HEALTHY
 Total size:    321466989 B
 Total dirs:    412
 Total files:   446
 Total blocks (validated):  355 (avg. block size 905540 B)
 Minimally replicated blocks:   355 (100.0 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:   334 (94.08451 %)
 Mis-replicated blocks:     0 (0.0 %)
 Default replication factor:    3
 Average block replication: 1.0
 Corrupt blocks:        0
 Missing replicas:      1109 (312.39438 %)
 Number of data-nodes:      1
 Number of racks:       1
FSCK ended at Sun May 19 13:09:14 PDT 2013 in 147 milliseconds

但是,你怀疑hbase gui没有在60030上运行。我没有在hbase日志中看到错误来解释原因。

更多信息@roman:hbase hbck刚刚超时使用MasterNotRunningException

stephenb@gondolin:/shared$ hbase hbck 
  13/05/19 13:16:16 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.3-1240972, built on 02/06/2012 10:48 GMT
  13/05/19 13:16:16 INFO zookeeper.ZooKeeper: Client environment:host.name=gondolin
  13/05/19 13:16:16 INFO zookeeper.ZooKeeper: Client environment:java.version=1.6.0_37
  13/05/19 13:16:16 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc.
  13/05/19 13:16:16 INFO zookeeper.ZooKeeper: Client environment:java.home=/shared/jdk1.6.0_37/jre
  13/05/19 13:16:16 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/shared/hadoop-1.0.3/libexec/../lib/native/Linux-amd64-64:/shared/hbase/lib/native/Linux-amd64-64
  13/05/19 13:16:16 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
  13/05/19 13:16:16 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
  13/05/19 13:16:16 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
  13/05/19 13:16:16 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
  13/05/19 13:16:16 INFO zookeeper.ZooKeeper: Client environment:os.version=3.2.0-39-generic
  13/05/19 13:16:16 INFO zookeeper.ZooKeeper: Client environment:user.name=stephenb
  13/05/19 13:16:16 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/stephenb
  13/05/19 13:16:16 INFO zookeeper.ZooKeeper: Client environment:user.dir=/shared
  13/05/19 13:16:16 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=180000 watcher=hconnection
  13/05/19 13:16:16 INFO zookeeper.ClientCnxn: Opening socket connection to server /127.0.0.1:2181
  13/05/19 13:16:16 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 24642@gondolin
  13/05/19 13:16:16 WARN client.ZooKeeperSaslClient: SecurityException: java.lang.SecurityException: Unable to locate a login configuration occurred when trying to find JAAS configuration.
  13/05/19 13:16:16 INFO client.ZooKeeperSaslClient: Client will not SASL-authenticate because the default JAAS configuration section 'Client' could not be found. If you are not using SASL, you may ignore this. On the other hand, if you expected SASL to work, please fix your JAAS configuration.
  13/05/19 13:16:16 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session
  13/05/19 13:16:16 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x13eb59ceb22002f, negotiated timeout = 180000
  13/05/19 13:17:27 INFO client.HConnectionManager$HConnectionImplementation: Closed zookeeper sessionid=0x13eb59ceb22002f
  13/05/19 13:17:27 INFO zookeeper.ZooKeeper: Session: 0x13eb59ceb22002f closed
  13/05/19 13:17:27 INFO zookeeper.ClientCnxn: EventThread shut down
  13/05/19 13:17:27 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=180000 watcher=hconnection
  13/05/19 13:17:27 INFO zookeeper.ClientCnxn: Opening socket connection to server /127.0.0.1:2181
  13/05/19 13:17:27 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 24642@gondolin
  13/05/19 13:17:27 WARN client.ZooKeeperSaslClient: SecurityException: java.lang.SecurityException: Unable to locate a login configuration occurred when trying to find JAAS configuration.
  13/05/19 13:17:27 INFO client.ZooKeeperSaslClient: Client will not SASL-authenticate because the default JAAS configuration section 'Client' could not be found. If you are not using SASL, you may ignore this. On the other hand, if you expected SASL to work, please fix your JAAS configuration.
  13/05/19 13:17:27 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session
  13/05/19 13:17:27 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x13eb59ceb220030, negotiated timeout = 180000
  13/05/19 13:18:39 INFO client.HConnectionManager$HConnectionImplementation: Closed zookeeper sessionid=0x13eb59ceb220030
  13/05/19 13:18:39 INFO zookeeper.ZooKeeper: Session: 0x13eb59ceb220030 closed
  13/05/19 13:18:39 INFO zookeeper.ClientCnxn: EventThread shut down
  13/05/19 13:18:39 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=180000 watcher=hconnection
  13/05/19 13:18:39 INFO zookeeper.ClientCnxn: Opening socket connection to server /127.0.0.1:2181
  13/05/19 13:18:39 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 24642@gondolin
  13/05/19 13:18:39 WARN client.ZooKeeperSaslClient: SecurityException: java.lang.SecurityException: Unable to locate a login configuration occurred when trying to find JAAS configuration.
  13/05/19 13:18:39 INFO client.ZooKeeperSaslClient: Client will not SASL-authenticate because the default JAAS configuration section 'Client' could not be found. If you are not using SASL, you may ignore this. On the other hand, if you expected SASL to work, please fix your JAAS configuration.
  13/05/19 13:18:39 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session
  13/05/19 13:18:39 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x13eb59ceb220031, negotiated timeout = 180000
  13/05/19 13:18:51 DEBUG client.HConnectionManager$HConnectionImplementation: The connection to null was closed by the finalize method.
  13/05/19 13:18:51 DEBUG client.HConnectionManager$HConnectionImplementation: 
  13/05/19 13:29:18 INFO client.HConnectionManager$HConnectionImplementation: Closed zookeeper sessionid=0x13eb59ceb220039
    13/05/19 13:29:18 INFO zookeeper.ZooKeeper: Session: 0x13eb59ceb220039 closed
    13/05/19 13:29:18 INFO zookeeper.ClientCnxn: EventThread shut down
    Exception in thread "main" org.apache.hadoop.hbase.MasterNotRunningException: Retried 10 times
        at org.apache.hadoop.hbase.client.HBaseAdmin.<init>(HBaseAdmin.java:130)
        at org.apache.hadoop.hbase.util.HBaseFsck.connect(HBaseFsck.java:264)
        at org.apache.hadoop.hbase.util.HBaseFsck.exec(HBaseFsck.java:3331)
        at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3192)

1 个答案:

答案 0 :(得分:1)

HBase web UI没有运行,是吗? 在单节点伪分布式集群完全崩溃后,我有类似的东西。 HDFS无法退出安全模式。

  1. 使用hadoop dfsadmin -safemode get检查HDFS未处于安全模式。
  2. 如果是,请手动强制安全模式退出hadoop dfsadmin -safemode leave
  3. 您应该看到进展 - 至少应该看到HBase Web UI。
  4. 执行HDFS fsck:hadoop fsck / -move
  5. 好的,如果一切顺利,最好执行hbase hbck检查。
  6. 您可能需要的其他提示:

    • 检查区域服务器与netstat -n -a绑定的位置(检查端口 在你的配置中)。它碰巧是错误的 接口。还请搜索论坛 - Hadoop存在问题 绑定和IPv6(check this for example)。
    • 使用hadoop dfsadmin -safemode get检查hadoop是否真的退出了安全模式。 HBase在完成之前无法完全启动。