Question

我正在尝试在Oozie（CDH 5.7）安排pyspark工作，但它一直在抛出错误。请在下面找到我的工作流程。

我已将.py脚本放在本地路径和hdfs路径中。如果我需要修改任何内容，请告诉我。

错误： [org.apache.oozie.action.hadoop.SparkMain]，退出代码[1]

select username,
       max(case when date = '7-1-2016' then date end),
       max(case when date = '7-2-2016' then date end),
       max(case when date = '7-3-2016' then date end),
       max(case when date = '7-4-2016' then date end),
       max(case when date = '7-5-2016' then date end),
       max(case when date = '7-6-2016' then date end),
       max(case when date = '7-7-2016' then date end),
       max(case when date = '7-8-2016' then date end)
from yourtable
group by username

Answer 1

我得到了解决方案。

Pyspak作业需要是hdfs路径，它应该是完整的路径，如hdsf：// user / ****
需要包括我已经完成的spark_home。

感谢。

Oozie -Pyspark工作投掷错误

1 个答案: