通过Uber Jar与Oozie和Hue一起运行Spark Job

时间:2016-09-16 18:36:18

标签: hadoop apache-spark oozie cloudera-cdh hue

我目前正在学习如何使用Apache Oozie在CDH 5.8中运行Spark Jobs,但似乎发现了问题。

我正在使用IntelliJ>编译我的火花作业构建工件(进入Uber JAR / Fat JAR),然后删除其清单文件。然后我运行spark-submit来运行JAR。它工作正常。

但是当我用Oozie指定Spark Action时。我收到以下错误:

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], exception invoking main(), java.lang.ClassNotFoundException: Class org.apache.oozie.action.hadoop.SparkMain not found
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.oozie.action.hadoop.SparkMain not found
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2199)
    at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:234)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.ClassNotFoundException: Class org.apache.oozie.action.hadoop.SparkMain not found
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2105)
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2197)
    ... 9 more

job.properties:

oozie.use.system.libpath=false
security_enabled=False
dryrun=False
jobTracker=master.meshiang:8032
nameNode=hdfs://master.meshiang:8020

我的工作流程

<workflow-app name="CSV" xmlns="uri:oozie:workflow:0.4">
    <start to="spark-2bab"/>
    <kill name="Kill">
        <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <action name="spark-2bab">
        <spark xmlns="uri:oozie:spark-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <master>local[*]</master>
            <mode>client</mode>
            <name>MySpark</name>
              <class>ETL.CSVTransform</class>
            <jar>/user/meshiang/jar/Spark-GetData.jar</jar>
              <arg>work_id</arg>
              <arg>csv_table</arg>
              <arg>id/FirstName/Lastname</arg>
              <arg>/user/meshiang/csv/ST1471448595.csv</arg>
              <arg>,</arg>
        </spark>
        <ok to="End"/>
        <error to="Kill"/>
    </action>
    <end name="End"/>
</workflow-app>

我已经做了什么:

  1. 当我将相同的jar放入工作区的/ lib文件夹时,并以与上面相同的方式使用它。该作业运行了10分钟,自杀,并没有显示任何错误代码或消息。
  2. 我在顺化运行了Spark示例作业。我收到以下消息
  3. 错误:

    JA018
    Error Message   Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.RuntimeException: Stream '/jars/oozie-examples.jar' was not found. at org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:219) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:106) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(Tr
    

    我的问题

    1. 我应该只编译我需要的类并使用Oozie ShareLibs吗? Oozie一般都支持Uber JARS吗?
    2. 如果我正在使用Pig / Sqoop,我还需要这样做吗?

1 个答案:

答案 0 :(得分:0)

要解决newSpeak.charCodeAt(0) === 0xD83D // "wrong" newSpeak.codePointAt(0) === 0x1F4A9 String.fromCharCode(0x1F4A9) !== newSpeak String.fromCodePoint(0x1F4A9) === newSpeak for (let i = 0; i < newSpeak.length; i++) console.log(newSpeak[i]) // "wrong" for (let c of newSpeak) console.log(c) [...''].map(c => `__${c}`).join('') === "____" ,您需要启用oozie system lib属性。

ClassNotFoundException: Class org.apache.oozie.action.hadoop.SparkMain

这是运行任何Hive,Pig,Sqoop,Spark等工作所必需的。

您可以编译和构建spark应用程序jar并将它们放入oozie应用程序路径下的oozie.use.system.libpath=true. 目录中。 Oozie应用程序路径是HDFS中用于存储和引用lib文件的目录。

希望这会有所帮助。感谢。