OOzie中PySpark作业的主要类

时间:2017-09-19 06:42:45

标签: pyspark oozie hortonworks-data-platform oozie-coordinator

我创建了一个pySpark Job,它在通过spark-submit提交时工作得很好。现在,当我尝试通过 Oozie失败。我怀疑我输入的字段有问题。这些字段是Oozie中Spark Action所必需的。

Spark Master : local
Mode : client 
Main class : DO I need to enter anything here as its Python + Spark code (Pyspark)
Jars/py files : My py module

enter image description here

Log Stdout如下所示

  =================================================================

  >>> Invoking Main class now >>>

  Fetching child yarn jobs
  tag id : oozie-653992fdf1609a2d4e19a863dff21a1
  Child yarn jobs are found -
  Spark Action Main class        : org.apache.spark.deploy.SparkSubmit

  Oozie Spark action configuration
  =================================================================

  --master
  local[*]
  --deploy-mode
  client
  --name
  POC1L
  --verbose
  /user/sachinkerala6174/pgm/poc1l.py

  =================================================================

  >>> Invoking Spark class now >>>

  python: can't open file '/user/sachinkerala6174/pgm/poc1l.py': [Errno 2] No such file or directory
  Intercepting System.exit(2)

  <<< Invocation of Main class completed <<<

  Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], exit code [2]

  Oozie Launcher failed, finishing Hadoop job gracefully

  Oozie Launcher, uploading action data to HDFS sequence file: hdfs://ip-172-31-53-48.ec2.internal:8020/user/sachinkerala6174/oozie-oozi/0000509-170711051319609-oozie-oozi-W/spark-fea0--spark/action-data.seq

  Oozie Launcher ends

1 个答案:

答案 0 :(得分:1)

您无需在“Main class”输入中输入任何内容。只需将hdfs://前缀添加到python文件路径,然后将Master更改为yarn,将模式更改为cluster(如果源代码位于HDFS上,则需要AFAIR)。