我正在尝试向现有的emr群集提交一个步骤:
aws emr add-steps --cluster-id j-XXXXXXXXXXXX --steps Type=spark,Name=MyApp,Args=[--deploy-mode,cluster,--master,yarn,--conf,spark.yarn.submit.waitAppCompletion=false,--num-executors,5,--executor-cores,5,--executor-memory,5g,file.py,--py-files,support-files.zip],ActionOnFailure=CONTINUE
但是,步骤失败并丢失文件错误?
16/08/31 19:01:06 INFO RMProxy: Connecting to ResourceManager at ip-xxx-xxx-xxx-xxx.ec2.internal/xxx.xxx.xxx.xxx:8032
16/08/31 19:01:07 INFO Client: Requesting a new application from cluster with 2 NodeManagers
16/08/31 19:01:07 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (11520 MB per container)
16/08/31 19:01:07 INFO Client: Will allocate AM container, with 11520 MB memory including 1047 MB overhead
16/08/31 19:01:07 INFO Client: Setting up container launch context for our AM
16/08/31 19:01:07 INFO Client: Setting up the launch environment for our AM container
16/08/31 19:01:07 INFO Client: Preparing resources for our AM container
16/08/31 19:01:07 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
16/08/31 19:01:09 INFO Client: Uploading resource file:/mnt/tmp/spark-e3f0079d-1c1a-4e68-9e69-2f83368d4405/__spark_libs__7513771565880645694.zip -> hdfs://ip-xxx-xxx-xxx-xxx.ec2.internal:8020/user/hadoop/.sparkStaging/application_1414534513678_0014/__spark_libs__7513771565880645694.zip
16/08/31 19:01:10 INFO Client: Uploading resource file:/mnt/var/lib/hadoop/steps/s-2746P63VUU8PI/file.py -> hdfs://ip-xxx-xxx-xxx-xxx.ec2.internal:8020/user/hadoop/.sparkStaging/application_1414534513678_0014/file.py
16/08/31 19:01:10 INFO Client: Deleting staging directory hdfs://ip-xxx-xxx-xxx-xxx.ec2.internal:8020/user/hadoop/.sparkStaging/application_1414534513678_0014
Exception in thread "main" java.io.FileNotFoundException: File file:/mnt/var/lib/hadoop/steps/s-2746P63VUU8PI/file.py does not exist
据我所知,该文件已正确上传?任何想法还有什么可能导致这个问题?
该文件存在于我执行aws emr add-steps
该文件似乎至少存在于主记事中;我没有一个简单的方法来确认,因为该过程在失败时尝试删除它(见下面的日志)