Spark,kerberos,yarn-cluster - >连接到hbase

时间:2017-05-24 17:12:01

标签: apache-spark hbase kerberos

面对Kerberos启用Hadoop集群的一个问题。

我们正在尝试在与Kafka(直接流)和hbase交互的yarn-cluster上运行流媒体作业。

不知何故,我们无法在群集模式下连接到hbase。我们使用keytab登录hbase。

这就是我们的工作:

spark-submit --master yarn-cluster --keytab "dev.keytab" --principal "dev@IO-INT.COM"  --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j_executor_conf.properties -XX:+UseG1GC" --conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j_driver_conf.properties -XX:+UseG1GC" --conf spark.yarn.stagingDir=hdfs:///tmp/spark/ --files "job.properties,log4j_driver_conf.properties,log4j_executor_conf.properties" service-0.0.1-SNAPSHOT.jar job.properties

连接到hbase:

def getHbaseConnection(properties: SerializedProperties): (Connection, UserGroupInformation) = {


    val config = HBaseConfiguration.create();
    config.set("hbase.zookeeper.quorum", HBASE_ZOOKEEPER_QUORUM_VALUE);
    config.set("hbase.zookeeper.property.clientPort", 2181);
    config.set("hadoop.security.authentication", "kerberos");
    config.set("hbase.security.authentication", "kerberos");
    config.set("hbase.cluster.distributed", "true");
    config.set("hbase.rpc.protection", "privacy");
   config.set("hbase.regionserver.kerberos.principal", “hbase/_HOST@IO-INT.COM”);
    config.set("hbase.master.kerberos.principal", “hbase/_HOST@IO-INT.COM”);

    UserGroupInformation.setConfiguration(config);

     var ugi: UserGroupInformation = null;
      if (SparkFiles.get(properties.keytab) != null
        && (new java.io.File(SparkFiles.get(properties.keytab)).exists)) {
        ugi = UserGroupInformation.loginUserFromKeytabAndReturnUGI(properties.kerberosPrincipal,
          SparkFiles.get(properties.keytab));
      } else {
        ugi = UserGroupInformation.loginUserFromKeytabAndReturnUGI(properties.kerberosPrincipal,
          properties.keytab);
      }


    val connection = ConnectionFactory.createConnection(config);
    return (connection, ugi);
  }

我们连接到hbase:  ...

foreachRDD { rdd =>
      if (!rdd.isEmpty()) {
        //var ugi: UserGroupInformation = Utils.getHbaseConnection(properties)._2
        rdd.foreachPartition { partition =>
          val connection = Utils.getHbaseConnection(propsObj)._1
          val table = …
          partition.foreach { json =>

          }
          table.put(puts)
          table.close()
          connection.close()
        }
      }
    }

Keytab文件没有被复制到yarn staging / temp目录,我们没有在SparkFiles.get中得到它...如果我们使用--files传递keytab,spark-submit失败,因为它已经在--keytab中了。

1 个答案:

答案 0 :(得分:0)

错误是:

This server is in the failed servers list: myserver.test.com/120.111.25.45:60020
RpcRetryingCaller{globalStartTime=1497943263013, pause=100, retries=5}, org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the failed servers list: myserver.test.com/120.111.25.45:60020
RpcRetryingCaller{globalStartTime=1497943263013, pause=100, retries=5}, org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the failed servers list: myserver.test.com/120.111.25.45:60020 at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:147)
    at org.apache.hadoop.hbase.client.HTable.get(HTable.java:935)