在EMR中使用RunJobFlow时如何提供S3路径作为HadoopJarStep的输入?

时间:2017-10-17 07:16:53

标签: pyspark boto3 emr amazon-emr

我正在尝试运行spark-submit,同时使用run_job_flow添加一个步骤,从boto3创建一个新的EMR群集。我正在尝试使用s3 file path中的Args提供 S3 的文件。以下是我的表现 -

'Args': ['spark-submit --deploy-mode cluster s3://{Bucket}/{path}']

这里是EMR控制台中的stderr输出 -

Exception in thread "main" java.lang.RuntimeException: java.io.IOException: Cannot run program "spark-submit --deploy-mode cluster s3://{Bucket}/{path}" (in directory "."): error=2, No such file or directory
    at com.amazonaws.emr.command.runner.ProcessRunner.exec(ProcessRunner.java:139)
    at com.amazonaws.emr.command.runner.CommandRunner.main(CommandRunner.java:13)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.io.IOException: Cannot run program "spark-submit --deploy-mode cluster s3://{Bucket}/{path}" (in directory "."): error=2, No such file or directory
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
    at com.amazonaws.emr.command.runner.ProcessRunner.exec(ProcessRunner.java:92)
    ... 7 more
Caused by: java.io.IOException: error=2, No such file or directory
    at java.lang.UNIXProcess.forkAndExec(Native Method)
    at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
    at java.lang.ProcessImpl.start(ProcessImpl.java:134)
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
    ... 8 more

0 个答案:

没有答案