通过Oozie shell动作运行spark2作业?

时间:2019-05-10 13:54:36

标签: shell apache-spark hadoop hbase oozie

正如标题中所述,我试图运行一个启动火花作业的shell动作,但不幸的是,我一直遇到以下错误...

    19/05/10 14:03:39 ERROR AbstractRpcClient: SASL authentication failed. 
     The most likely cause is missing or invalid credentials. Consider 
    'kinit'.
     javax.security.sasl.SaslException: GSS initiate failed [Caused by 
     GSSException: No valid credentials provided (Mechanism level: Failed                                     
     to find any Kerberos tgt)]
      java.io.IOException: Could not set up IO Streams to 
     <hbaseregionserver>
    Fri May 10 14:03:39 BST 2019, 
    RpcRetryingCaller{globalStartTime=1557493419339, pause=100, 
    retries=2}, org.apache.hadoop.hbase.ipc.FailedServerException: This 
    server is in the failed servers list: <hbaseregionserver>

据我所知,与Oozie工作有关的原因是无法通过kerberos票证,所以有人尝试把脚本拿进kerberos票证但没有运气。 ?我不知所措?相关代码如下

Oozie工作流程操作

     <action name="sparkJ" cred="hive2Cred">
            <shell xmlns="uri:oozie:shell-action:0.1">
                    <job-tracker>${jobTracker}</job-tracker>
                    <name-node>${nameNode}</name-node>
                    <configuration>
                            <property>
                                    <name>mapred.job.queue.name</name>
                                    <value>${oozieQueueName}</value>
                            </property>
                    </configuration>
                    <exec>run.sh</exec>
                    <file>/thePathToTheScript/run.sh#run.sh</file>             
    <file>/thePathToTheProperties/myp.properties#myp.properties</file>
                    <capture-output />
             </shell>
                   <ok to="end" />
                   <error to="fail" />
            </action>

Shell脚本

     #!/bin/sh
      export job_name=SPARK_JOB
      export configuration=myp.properties

      export num_executors=10
      export executor_memory=1G
      export queue=YARNQ
      export max_executors=50
      kinit -kt KEYTAB KPRINCIPAL
      echo "[[[[[[[[[[[[[ Starting Job - name:${job_name}, 
      configuration:${configuration} ]]]]]]]]]]]]]]"

      /usr/hdp/current/spark2-client/bin/spark-submit \
      --name ${job_name} \
   --driver-java-options "-Dlog4j.configuration=file:./log4j.properties" \
     --num-executors ${num_executors} \
     --executor-memory ${executor_memory} \
     --master yarn \
      --keytab KEYTAB \
      --principal KPRINCIPAL \
      --supervise \
     --deploy-mode cluster \
     --queue ${queue} \
     --files "./${configuration},./hbase-site.xml,./log4j.properties" \
     --conf spark.driver.extraClassPath="/usr/hdp/current/hive- 
     client/lib/datanucleus-*.jar:/usr/hdp/current/tez-client/*.jar" \
     --conf spark.executor.extraJavaOptions="- 
     Djava.security.auth.login.config=./jaas.conf - 
     Dlog4j.configuration=file:./log4j.properties"  \
     --conf spark.executor.extraClassPath="/usr/hdp/current/hive- 
     client/lib/datanucleus-*.jar:/usr/hdp/current/tez-client/*.jar" \
     --conf spark.streaming.stopGracefullyOnShutdown=true \
     --conf spark.dynamicAllocation.enabled=true \
     --conf spark.shuffle.service.enabled=true \
     --conf spark.dynamicAllocation.maxExecutors=${max_executors} \
     --conf spark.streaming.concurrentJobs=2 \
     --conf spark.streaming.backpressure.enabled=true \
     --conf spark.yarn.security.tokens.hive.enabled=true \
     --conf spark.yarn.security.tokens.hbase.enabled=true \
     --conf spark.streaming.kafka.maxRatePerPartition=5000 \
     --conf spark.streaming.backpressure.pid.maxRate=3000 \
     --conf spark.streaming.backpressure.pid.minRate=200 \
     --conf spark.streaming.backpressure.initialRate=5000 \
     --jars /usr/hdp/current/hbase-client/lib/guava- 
     12.0.1.jar,/usr/hdp/current/hbase-client/lib/hbase- 
     common.jar,/usr/hdp/current/hbase-client/lib/hbase- 
     client.jar,/usr/hdp/current/hbase-client/lib/hbase- 
     protocol.jar,/usr/hdp/current/spark-client/lib/datanucleus-api-jdo- 
     3.2.6.jar,/usr/hdp/current/spark-client/lib/datanucleus-rdbms- 
     3.2.9.jar,/usr/hdp/current/spark-client/lib/datanucleus-core- 
     3.2.10.jar \
     --class myclass myjar.jar ./${configuration}

非常感谢您提供的任何帮助。

0 个答案:

没有答案