如何通过Cloudformation在EMR上运行Spark作业

时间:2019-01-12 01:26:36

标签: amazon-web-services apache-spark pyspark amazon-cloudformation amazon-emr

我刚刚开始使用AWS,并且一直在使用EMR和CloudFormation。我的目标是编写一个Cloudformation模板,该模板将:

1. Create an EMR cluster with Spark and Hadoop installed
2. Run Spark jobs on the EMR cluster. Jobs will be submitted as a JAR or Pyspark files.

我已经能够成功完成步骤1,但不确定如何通过CloudFormation完成步骤2。

我一直试图在AWS文档和其他站点上查看几个示例,但是我看不到其中通过CloudFormation模板部署了火花作业的示例。

任何正确方向的示例或指针将非常有帮助。预先感谢!

1 个答案:

答案 0 :(得分:0)

像这样更改您的EMR Cloudformation脚本 EMR的参数部分

StepScriptFilePath:
  Type: String
  Description: Step Scipt to run a bash script or add a java file here
  Default: 's3://s3-bucket/steps/step1.sh'
StepScriptFilePython:
  Type: String
  Description: Step Scipt to run a python file file
  Default: 's3://s3-bucket/steps/step2.py'
StepJar:
  Type: String
  Description: Spark jar file
  Default: 's3://elasticmapreduce/libs/script-runner/script-runner.jar'

在EMR属性下添加

  Steps:
    - ActionOnFailure: CONTINUE
      HadoopJarStep:
        Args:
          - Ref: StepScriptFile
        Jar:
          Ref: StepJar
        MainClass: ''
      Name: run any bash or java job in spark
   - ActionOnFailure: CONTINUE
      HadoopJarStep:
        Args:
          - "spark-submit"
          - Ref: StepScriptFilePython
        Jar: command-runner.jar
      Name: run a python script job