在群集模式下运行hbase时出现致命错误“无法成为活动主服务器”

时间:2019-01-29 05:05:52

标签: hadoop hbase

我有4个节点,一个主节点和3个从节点。

主设备:*。*。*。18,从设备:*。*。 * .12、104、36。

Namenode上Hadoop的配置:

core-site.xml:

<configuration>
<property>
    <name>fs.defaultFS</name>
    <value>hdfs://localhost:9000</value>
</property>
</configuration>

hdfs-site.xml:

<configuration>
<property>
    <name>dfs.replication</name>
    <value>3</value>
</property>
<property>
    <name>dfs.namenode.name.dir</name>
    <value>file:///home/hduser/hadoop_store/hdfs/namenode</value>
</property>
<property>
    <name>dfs.datanode.data.dir</name>
    <value>file:///home/hduser/hadoop_store/hdfs/datanode</value>
</property>
</configuration>

hadoop-env.sh:

export JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64"
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"}
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"
export HADOOP_PID_DIR=${HADOOP_PID_DIR} // default to /tmp
export HADOOP_SECURE_DN_PID_DIR=${HADOOP_PID_DIR}
export HADOOP_IDENT_STRING=$USER

mapred-site.xml:

<configuration>
<property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
</property>
<property>
        <name>mapred.job.tracker</name>
        <value>localhost:54311</value>
</property>
</configuration>

奴隶:

10.0.3.12
10.0.3.36
10.0.3.104

yarn-site.xml:

<configuration>

<!-- Site specific YARN configuration properties -->

<property>
    <name>yarn.resourcemanager.address</name>
    <value>localhost:8050</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

</configuration>

在从属节点中,hadoop的配置为:

yarn-site.xml:

<configuration>

<!-- Site specific YARN configuration properties -->

<property>
    <name>yarn.resourcemanager.address</name>
    <value>10.0.3.18:8050</value>
</property>
<property>
    <name>yarn.nodemanager.address</name>
    <value>localhost:8035</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

</configuration>

其余所有文件在所有从属节点中均与在主节点中相同。关于Hbase配置,

hbase-env.sh(总计):

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m -XX:ReservedCodeCacheSize=256m"
export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m -XX:ReservedCodeCacheSize=256m"
export HBASE_REGIONSERVERS=${HBASE_HOME}/conf/regionservers
export HBASE_MANAGES_ZK=true

hbase-site.xml(总计):

<configuration>
    <property>
            <name>hbase.rootdir</name>
            <value>hdfs://localhost:9000/hbase</value>
    </property>
    <property>
            <name>hbase.cluster.distributed</name>
            <value>true</value>
    </property>
    <property>
            <name>hbase.zookeeper.quorum</name>
            <value>10.0.3.18,10.0.3.12,10.0.3.104,10.0.3.36</value>
    </property>
    <property>
            <name>hbase.zookeeper.property.dataDir</name>
            <value>/home/hduser/Downloads/hbase/zookeeper</value>
    </property>
    <property>
            <name>hbase.zookeeper.property.clientPort</name>
            <value>2181</value>
    </property>
    <property>
            <name>dfs.replication</name>
            <value>3</value>
    </property>
    <property>
            <name>zookeeper.session.timeout</name>
            <value>1200000</value>
    </property>
    <property>
            <name>hbase.zookeeper.property.tickTime</name>
            <value>6000</value>
    </property>
</configuration>

除了在奴隶中,本地主机更改为10.0.3.18(namenode的地址)

区域服务器:

10.0.3.12
10.0.3.104
10.0.3.36

我格式化了名称节点,并使用以下命令启动hdfs:start-dfs.sh和start-yarn.sh时,输出如下:

...succefully formatted namenode...
localhost: starting namenode, logging to /home/hduser/Downloads/hadoop/logs/hadoop-hduser-namenode-saichanda-OptiPlex-9020.out
10.0.3.12: starting datanode, logging to /home/hduser/Downloads/hadoop/logs/hadoop-hduser-datanode-aaron.out
10.0.3.36: starting datanode, logging to /home/hduser/Downloads/hadoop/logs/hadoop-hduser-datanode-dmacs-OptiPlex-9020.out
10.0.3.104: starting datanode, logging to /home/hduser/Downloads/hadoop/logs/hadoop-hduser-datanode-hadoop-104.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/hduser/Downloads/hadoop/logs/hadoop-hduser-secondarynamenode-saichanda-OptiPlex-9020.out
starting yarn daemons
starting resourcemanager, logging to /home/hduser/Downloads/hadoop/logs/yarn-hduser-resourcemanager-saichanda-OptiPlex-9020.out
10.0.3.12: starting nodemanager, logging to /home/hduser/Downloads/hadoop/logs/yarn-hduser-nodemanager-aaron.out
10.0.3.36: starting nodemanager, logging to /home/hduser/Downloads/hadoop/logs/yarn-hduser-nodemanager-dmacs-OptiPlex-9020.out
10.0.3.104: starting nodemanager, logging to /home/hduser/Downloads/hadoop/logs/yarn-hduser-nodemanager-hadoop-104.out

当我在主机上运行jps命令时:

28032 SecondaryNameNode
28481 Jps
28198 ResourceManager
27720 NameNode

当我在奴隶上运行jps命令时:

11303 DataNode
11595 Jps
11436 NodeManager

然后,我使用以下命令启动Hbase:./ start-hbase.sh。输出是:

10.0.3.12: running zookeeper, logging to /home/hduser/Downloads/hbase/bin/../logs/hbase-hduser-zookeeper-aaron.out
10.0.3.36: running zookeeper, logging to /home/hduser/Downloads/hbase/bin/../logs/hbase-hduser-zookeeper-dmacs-OptiPlex-9020.out
10.0.3.104: running zookeeper, logging to /home/hduser/Downloads/hbase/bin/../logs/hbase-hduser-zookeeper-hadoop-104.out
10.0.3.18: running zookeeper, logging to /home/hduser/Downloads/hbase/bin/../logs/hbase-hduser-zookeeper-saichanda-OptiPlex-9020.out
running master, logging to /home/hduser/Downloads/hbase/logs/hbase-hduser-master-saichanda-OptiPlex-9020.out
OpenJDK 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
10.0.3.12: running regionserver, logging to /home/hduser/Downloads/hbase/bin/../logs/hbase-hduser-regionserver-aaron.out
10.0.3.36: running regionserver, logging to /home/hduser/Downloads/hbase/bin/../logs/hbase-hduser-regionserver-dmacs-OptiPlex-9020.out
10.0.3.104: running regionserver, logging to /home/hduser/Downloads/hbase/bin/../logs/hbase-hduser-regionserver-hadoop-104.out
10.0.3.12: OpenJDK 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
10.0.3.12: OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
10.0.3.36: OpenJDK 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
10.0.3.36: OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
10.0.3.104: OpenJDK 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
10.0.3.104: OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0

当我在namenode上运行jps时:

28032 SecondaryNameNode
28821 HQuorumPeer
29126 Jps
28198 ResourceManager
27720 NameNode

当我在奴隶上运行jps时

11776 HRegionServer
11669 HQuorumPeer
11303 DataNode
11899 Jps
11436 NodeManager

我观察到的是HMaster不在namenode上运行。任何人都可以帮助了解HMaster崩溃的原因。一段时间后,甚至NodeManager也会在从站中崩溃。我还观察到,当我关闭hbase时,从属服务器上的HRegionservers不会关闭,即使在主节点中发出stop-hbase.sh命令后,它们仍继续运行。在“我的日志”中观察到的主要警告和错误如下。

hadoop-namenode.log:多次出现此异常...

java.io.IOException: File /hbase/.tmp/hbase.version could only be replicated to 0 nodes instead of minReplication (=1).  There are 0 datanode(s) running and no node(s) are excluded in this operation.

hadoop-secondary-namenode.log:多次出现此错误...

ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint
java.io.IOException: Inconsistent checkpoint fields.

yarn-resourcemanager.log中没有发现错误。

对于hbase日志:在hbase-master.log中:

 FATAL [saichanda-OptiPlex-9020:16000.activeMasterManager] master.HMaster: Failed to become active master
File /hbase/.tmp/hbase.version could only be replicated to 0 nodes instead of minReplication (=1).  There are 0 datanode(s) running and no node(s) are excluded in this operation.

在hbase-zookeeper.log中:我看到这一行,因为日志中没有错误。

019-01-29 10:09:49,431 INFO  [main] server.NIOServerCnxnFactory: binding to port 0.0.0.0/0.0.0.0:2181

其中一个奴隶,regionserver.log:

 client.ZooKeeperRegistry: ClusterId read in ZooKeeper is null

在其中一个从站上,hadoop-datanode.log多次给出以下警告。

WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: localhost/127.0.0.1:9000

在上述所有警告和错误中,我觉得与HBASE-MASTER.LOG SEEMS有关的错误至关重要,在某些情况下,该错误已复制到0个节点而不是minReplication(= 1)。请帮我解决这个问题。

另外,当我最终运行hbase shell时,出现错误:

ERROR: Can't get master address from ZooKeeper; znode data == null

谢谢。

0 个答案:

没有答案