Spark Cluster模式问题,用于读取Kerberized Environment上的Hive-Hbase表

时间:2018-03-21 15:45:17

标签: apache-spark hive hbase kerberos cluster-mode

错误说明

我们无法在yarn-cluster或yarn-client模式下执行我们的Spark作业,尽管它在本地模式下工作正常。

当我们尝试读取Kerberized群集中的Hive-HBase表时,会发生此问题。

到目前为止我们尝试了什么

  1. 在spark submi
  2. 中的-jar参数中传递所有HBASE jar

    --jars /usr/hdp/current/hive-client/lib/hive-hbase-handler-1.2.1000.2.5.3.16-1.jar,/usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar,/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar,/usr/hdp/current/hbase-client/lib/hbase-client.jar,/usr/hdp/current/hbase-client/lib/hbase-common.jar,/usr/hdp/current/hbase-client/lib/hbase-protocol.jar,/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar,/usr/hdp/current/hbase-client/lib/protobuf-java-2.5.0.jar,/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar,/usr/hdp/current/hbase-client/lib/hbase-server.jar

    1. 在Spark submit
    2. 中的文件参数中传递Hbase站点和hive站点

      --files /usr/hdp/2.5.3.16-1/hbase/conf/hbase-site.xml,/usr/hdp/current/spark-client/conf/hive-site.xml,/home/pasusr/pasusr.keytab

      1. 在应用程序内部执行Kerberos身份验证。在代码中,我们明确地传递了密钥选项卡

        UserGroupInformation.setConfiguration(配置) val ugi:UserGroupInformation = UserGroupInformation.loginUserFromKeytabAndReturnUGI(original,keyTab) UserGroupInformation.setLoginUser(UGI)   ConnectionFactory.createConnection(配置) return ugi.doAs(new PrivilegedExceptionActionConnection {   @throws [IOException异常]   def run:Connection = { ConnectionFactory.createConnection(配置)} })

      2. 在Spark提交

      3. 中传递关键标签信息
      4. 在spark.driver.extraClassPath和spark.executor.extraClassPath中传递HBASE jar
      5. 错误日志

        18/03/20 15:33:24 WARN TableInputFormatBase: You are using an HTable instance that relies on an HBase-managed Connection. This is usually due to directly creating an HTable, which is deprecated. Instead, you should create a Connection object and then request a Table instance from it. If you don't need the Table instance for your own use, you should instead use the TableInputFormatBase.initalizeTable method directly.
        18/03/20 15:47:38 WARN TaskSetManager: Lost task 0.0 in stage 7.0 (TID 406, hadoopnode.server.name): java.lang.IllegalStateException: Error while configuring input job properties
            at org.apache.hadoop.hive.hbase.HBaseStorageHandler.configureTableJobProperties(HBaseStorageHandler.java:444)
            at org.apache.hadoop.hive.hbase.HBaseStorageHandler.configureInputJobProperties(HBaseStorageHandler.java:342)
        Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=50, exceptions:
        Caused by: java.lang.RuntimeException: SASL authentication failed. The most likely cause is missing or invalid credentials. Consider 'kinit'.
            at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$1.run(RpcClientImpl.java:679)
            at java.security.AccessController.doPrivileged(Native Method)
            at javax.security.auth.Subject.doAs(Subject.java:422)
        Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
        

1 个答案:

答案 0 :(得分:0)

我可以通过在 spark-env.sh

中添加以下配置来解决此问题
  

export SPARK_CLASSPATH = / usr / hdp / current / hbase-client / lib / hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/ usr / hdp / current /hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/guava-12.0.1的.jar

删除 spark.driver.extraClassPath spark.executor.extraClassPath ,其中我从Spark提交命令传递了上面的Jar。