我可以在Oozie中运行py spark作为shell工作吗?

时间:2017-07-26 11:13:54

标签: hadoop apache-spark pyspark hdfs oozie

我有python脚本,我可以通过spark-submit运行。我需要在Oozie中使用它。

<!-- move files from local disk to hdfs -->
<action name="forceLoadFromLocal2hdfs">
<shell xmlns="uri:oozie:shell-action:0.3">
  <job-tracker>${jobTracker}</job-tracker>
  <name-node>${nameNode}</name-node>
  <configuration>
    <property>
      <name>mapred.job.queue.name</name>
      <value>${queueName}</value>
    </property>
  </configuration>
  <exec>driver-script.sh</exec>
<!-- single -->
  <argument>s</argument>
<!-- py script -->
  <argument>load_local_2_hdfs.py</argument>
<!-- local file to be moved-->
  <argument>localPathFile</argument>
<!-- hdfs destination folder, be aware of, script is deleting existing folder! -->
  <argument>hdfFolder</argument>
  <file>${workflowRoot}driver-script.sh#driver-script.sh</file>
  <file>${workflowRoot}load_local_2_hdfs.py#load_local_2_hdfs.py</file>
</shell>
<ok to="end"/>
<error to="killAction"/> 
</action>

脚本本身通过driver-script.sh运行正常。通过oozie,即使工作流的状态为SUCCEEDED,该文件也不会复制到hdfs。我无法在pyspark工作中找到任何错误日志或相关日志。

我有另外一个关于来自Spark by oozie here

的压制日志的话题

1 个答案:

答案 0 :(得分:0)

Set your script to set -x in the beginning that will show you which line the script is it. You can see those in the stderr.

Can you elaborate on what you mean by file is not copied ? To help you better.