我刚刚开始使用AWS,并且一直在使用EMR和CloudFormation。我的目标是编写一个Cloudformation模板,该模板将:
1. Create an EMR cluster with Spark and Hadoop installed
2. Run Spark jobs on the EMR cluster. Jobs will be submitted as a JAR or Pyspark files.
我已经能够成功完成步骤1,但不确定如何通过CloudFormation完成步骤2。
我一直试图在AWS文档和其他站点上查看几个示例,但是我看不到其中通过CloudFormation模板部署了火花作业的示例。
任何正确方向的示例或指针将非常有帮助。预先感谢!
答案 0 :(得分:0)
像这样更改您的EMR Cloudformation脚本 EMR的参数部分
StepScriptFilePath:
Type: String
Description: Step Scipt to run a bash script or add a java file here
Default: 's3://s3-bucket/steps/step1.sh'
StepScriptFilePython:
Type: String
Description: Step Scipt to run a python file file
Default: 's3://s3-bucket/steps/step2.py'
StepJar:
Type: String
Description: Spark jar file
Default: 's3://elasticmapreduce/libs/script-runner/script-runner.jar'
在EMR属性下添加
Steps:
- ActionOnFailure: CONTINUE
HadoopJarStep:
Args:
- Ref: StepScriptFile
Jar:
Ref: StepJar
MainClass: ''
Name: run any bash or java job in spark
- ActionOnFailure: CONTINUE
HadoopJarStep:
Args:
- "spark-submit"
- Ref: StepScriptFilePython
Jar: command-runner.jar
Name: run a python script job