Hadoop / HBASE - 无法使用HDFS高可用性(故障转移)

时间:2016-02-04 12:48:39

标签: hbase hadoop2 failover

我正在尝试构建具有故障转移功能的Hadoop架构。 我的问题是我无法使用HDFS HA正确配置RegionServer。我在RegionServer日志中有以下错误

java.io.IOException: Port 9000 specified in URI hdfs://HAcluster:9000 but host 'HAcluster' is a logical (HA) namenode and does not use port information.
at org.apache.hadoop.hdfs.NameNodeProxies.getFailoverProxyProviderClass(NameNodeProxies.java:396)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:134)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:510)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:453)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:136)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2433)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2467)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2449)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:166)
at org.apache.hadoop.hbase.regionserver.HRegionServer.startRegionServer(HRegionServer.java:2508)
at org.apache.hadoop.hbase.regionserver.HRegionServer.startRegionServer(HRegionServer.java:2492)
at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:62)
at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:85)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:2543)

在我的组件下面:

  • Hadoop:2.7.1
  • HBASE:0.98.12
  • Zookeeper:3.4.6
  • Java:JDK Oracle 1.7_75

关于架构,我有6个VM:

  • 2位大师:
    • HDFS(NameNode)
    • YARN(资源经理)
    • HBASE(大师)

一个主服务器处于活动状态,另一个处于待机状态(如果第一个服务器崩溃)。备用主服务器只是活动主服务器的复制

  • 一个奴隶:
    • HDFS(DataNode)
    • YARN(节点管理员)
    • HBASE(区域服务器)
  • 专用服务器上的三个动物园管理员

每个组件都处于HA(高可用性)模式。为此,我必须为HDFS创建逻辑集群。 YARN

在不同的文件下方,它们可能有助于更好地理解:

hdfs-site.xml(定义了HAcluster) - 3个服务器相同,但HA可用性范围之外的某些属性除外

<configuration>
<property>
    <name>dfs.replication</name>
    <value>1</value>
    <description>the value is the number of the copy of the file in the file system</description>
</property>
<!-- High Availability Hadoop -->
<property>
    <name>dfs.nameservices</name>
    <value>HAcluster</value> <!-- HAcluster is consisted of SUNRAY009IV06 = MASTER 1 and SUNRAY009IV07 = MASTER 2 -->
    <final>true</final>
    <description>The name of your cluster which consists of Master 1 and Master 2</description>
</property>
<property>
    <name>dfs.ha.namenodes.HAcluster</name>
    <value>SUNRAY009IV06,SUNRAY009IV07</value> <!--SUNRAY009IV06 = MASTER 1, SUNRAY009IV07 = MASTER 2 -->
    <final>true</final>
    <description>The namenodes in your cluster</description>
</property>
<property>
    <name>dfs.namenode.rpc-address.HAcluster.SUNRAY009IV06</name>
    <value>SUNRAY009IV06:9000</value> <!--SUNRAY009IV06 = MASTER 1 -->
    <description>the RPC adress of your Master 1</description>
</property>
<property>
    <name>dfs.namenode.rpc-address.HAcluster.SUNRAY009IV07</name>
    <value>SUNRAY009IV07:9000</value> <!--SUNRAY009IV07 = MASTER 2 -->
    <description>the RPC adress of your Master 2</description>
</property>
<property>
    <name>dfs.namenode.http-address.HAcluster.SUNRAY009IV06</name>
    <value>SUNRAY009IV06:50070</value> <!--SUNRAY009IV06 = MASTER 1 -->
    <description>the HTTP adress of your Master 1</description>
</property>
<property>
    <name>dfs.namenode.http-address.HAcluster.SUNRAY009IV07</name>
    <value>SUNRAY009IV07:50070</value> <!--SUNRAY009IV07 = MASTER 2 -->
    <description>the HTTP adress of your Master 2</description>
</property>
<property>
    <name>dfs.namenode.shared.edits.dir</name>
    <value>qjournal://SUNRAY009IV06:8485;SUNRAY009IV07:8485;SUNRAY009IV08:8485/HAcluster</value>
    <!--SUNRAY009IV06 = MASTER 1, SUNRAY009IV07 = MASTER 2, SUNRAY009IV08 = SLAVE 1 -->
    <description>the location of the shared storage directory</description>
</property>
<property>
    <name>dfs.client.failover.proxy.provider.HAcluster</name>
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    <description>the Java class that HDFS clients use to contact the Active NameNode</description>
</property>
<property> 
    <name>dfs.permissions</name>
    <value>false</value>
    <description>disable hdfs permissions</description>
</property>
<property>
    <name>dfs.ha.automatic-failover.enabled</name>
    <value>true</value>
    <description>The backup is defined as automatic</description>
</property>
<property>
    <name>ha.zookeeper.quorum</name>
    <value>SUNRAY009IV09:2181,SUNRAY009IV11:2181,SUNRAY009IV13:2181</value>
    <description>The list of your Zookeeper servers in your Hadoop architecture</description>
    <!--SUNRAY009IV09 = ZOOKEEPER 1, SUNRAY009IV11 = ZOOKEEPER 2, SUNRAY009IV13 = ZOOKEEPER 3 -->
</property>
<property>
    <name>dfs.ha.fencing.methods</name>
    <value>sshfence</value>
    <description> method which will be used to fence the Active NameNode during a failover. 
    sshfence = SSH to the Active NameNode and kill the process</description>
</property>
<property>
    <name>dfs.ha.fencing.ssh.private-key-files</name>
    <value>/home/hadoopuser/.ssh/id_rsa</value>
    <description>List of SSH private key files</description>
</property>
<property>
    <name>dfs.ha.fencing.ssh.connect-timeout</name>
    <value>3000</value>
    <description>timeout</description>
</property>

yarn-site.xml - 除了HA可用性范围之外的一些属性外,3台服务器相同

<configuration>

<!-- Site specific YARN configuration properties -->
<property>
    <name>yarn.resourcemanager.hostname</name>
    <value>HAyarn</value>
    <!--HAyarn is consisted of SUNRAY009IV06 = MASTER 1 and SUNRAY009IV07 = MASTER 2 -->
    <description>The name of the Resource Manager</description>
</property>
<property>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
    <description>to enable YARN logs</description>
</property>
<property>
    <name>yarn.nodemanager.remote-app-log-dir</name>
    <value>/tmp/logs</value>
    <description>Where to store logs in HDFS</description>
</property>
<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
    <description>shuffle service that needs to be set for Map Reduce to run</description>
</property>
<property>
    <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    <description>mapreduce_shuffle service to implement</description>
</property>
<property>
    <name>mapreduce.jobhistory.address</name>
    <value>HAyarn:8031</value>
    <!--HAyarn is consisted of SUNRAY009IV06 = MASTER 1 and SUNRAY009IV07 = MASTER 2 -->
    <description>host is the hostname of the resource manager and  the port is the port on which the NodeManagers contact the Resource Manage</description>
</property>

<!-- High Availability YARN -->
<property>
    <name>yarn.resourcemanager.ha.enabled</name>
    <value>true</value>
</property>
<property>
    <name>yarn.resourcemanager.cluster-id</name>
    <value>HAyarn</value>
</property>
<property>
    <name>yarn.resourcemanager.ha.rm-ids</name>
    <value>rm1,rm2</value>
</property>
<property>
    <name>yarn.resourcemanager.hostname.rm1</name>
    <value>SUNRAY009IV06</value>
    <!--SUNRAY009IV06 = MASTER 1, SUNRAY009IV07 = MASTER 2-->
    <description>The hostname of MASTER 1</description>
</property>
<property>
    <name>yarn.resourcemanager.hostname.rm2</name>
    <value>SUNRAY009IV07</value>
    <!--SUNRAY009IV06 = MASTER 1, SUNRAY009IV07 = MASTER 2-->
    <description>The hostnameof MASTER 2</description>
</property>
<property>
    <name>yarn.resourcemanager.webapp.address.rm1</name>
    <value>SUNRAY009IV06:8088</value>
    <!--SUNRAY009IV06 = MASTER 1, SUNRAY009IV07 = MASTER 2-->
    <description>The Web application address of MASTER 1</description>
</property>
<property>
    <name>yarn.resourcemanager.webapp.address.rm2</name>
    <value>SUNRAY009IV07:8088</value>
    <!--SUNRAY009IV06 = MASTER 1, SUNRAY009IV07 = MASTER 2-->
    <description>The Web application address of MASTER 2</description>
</property>
<property>
    <name>yarn.resourcemanager.zk-address</name>
    <value>SUNRAY009IV09:2181,SUNRAY009IV11:2181,SUNRAY009IV13:2181</value>
    <description>The list of your Zookeeper servers in your Hadoop architecture</description>
    <!--SUNRAY009IV09 = ZOOKEEPER 1, SUNRAY009IV11 = ZOOKEEPER 2, SUNRAY009IV13 = ZOOKEEPER 3 -->
</property>
<property>
    <name>yarn.client.failover-proxy-provider.HAyarn</name>
    <value>org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider</value>
    <description>the class used for the YARN failover</description>
</property>

hbase-site.xml(3台服务器中相同)

<property>
    <name>hbase.rootdir</name>
    <value>hdfs://HAcluster/hbase</value> <!--HAcluster is consisted of SUNRAY009IV06 = MASTER 1 and SUNRAY009IV07 = MASTER 2 -->
    <description>The directory shared by RegionServers (slaves)</description>
</property>
<property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
    <description>The mode the cluster will be in</description>
</property>
<property>
    <name>hbase.zookeeper.property.clientPort</name>
    <value>2181</value>
    <description>Property from ZooKeeper's config zoo.cfg. The port at which the clients will connect.</description>
</property>
<property>
    <name>hbase.zookeeper.quorum</name>
    <value>SUNRAY009IV09,SUNRAY009IV11,SUNRAY009IV13</value>
    <descrption>The list of your Zookeeper servers in your Hadoop architecture</descrption>
    <!--SUNRAY009IV09 = ZOOKEEPER 1, SUNRAY009IV11 = ZOOKEEPER 2, SUNRAY009IV13 = ZOOKEEPER 3 -->
</property>
<property>
    <name>hbase.zookeeper.property.dataDir</name>
    <value>/home/zookeeper</value>
    <description>Property from ZooKeeper's config zoo.cfg. The directory where the snapshot is stored.</description>
</property>
<property>
    <name>zookeeper.znode.parent</name>
    <value>/hbase</value>
    <description>The root znode that will contain all the znodes created/used byHBase</description>
</property>

hbase-env.sh - 只有有用的部分

#Tell HBase whether it should manage it's own instance of Zookeeper or not.
export HBASE_MANAGES_ZK=false

在发布之前,我曾在Google上进行过研究。没有什么对我有用,所以我做了一些尝试: - 我尝试更改HBASE版本。我下载了最后一个(0.98.17-hadoop2)。没有效果 - 我尝试从头开始这意味着:格式化HDFS,删除Zookeeper元数据,删除znodes等... - 我尝试在每个有HBASE的服务器上用hdfs:// MASTER1:9000 / hbase替换hdfs:// HAcluster / hbase。没效果。

所以我有点迷失,因为即使没有逻辑群集,我仍然会遇到错误。

PS:其余所有工作都按预期工作:datanode / nodemanager连接到活动的namenode / resourcemanager(使用Web应用程序检查) HBASE主服务器也正常运行,备份主服务器也被考虑在内(使用webapp检查) 这也是我不明白我有这个错误的原因

我希望我能给你所有正确理解我的问题的元素

0 个答案:

没有答案