在aws emr上激活提交python应用程序的正确方法是什么?

时间:2016-08-29 21:07:52

标签: python amazon-web-services apache-spark emr

我已连接到Spark群集的主节点,在emr内部运行,并且我正在尝试提交基于python的应用程序:

spark-submit --verbose --deploy-mode cluster --master yarn-cluster --num-executors 3 --executor-cores 6 --executor-memory 1g test.py 

该过程会生成一组日志转储,包括以下对集群的部署确认:

6/08/29 20:47:51 INFO Client: Uploading resource file:/home/hadoop/test.py -> hdfs://ip-xxx-xxx-xxx-xxx.ec2.internal:8020/user/hadoop/.sparkStaging/application_1472396426409_0007/test.py
16/08/29 20:47:51 INFO Client: Uploading resource file:/usr/lib/spark/python/lib/pyspark.zip -> hdfs://ip-xxx-xxx-xxx-xxx.ec2.internal:8020/user/hadoop/.sparkStaging/application_1472396426409_0007/pyspark.zip
16/08/29 20:47:51 INFO Client: Uploading resource file:/usr/lib/spark/python/lib/py4j-0.10.1-src.zip -> hdfs://ip-xxx-xxx-xxx-xxx.ec2.internal:8020/user/hadoop/.sparkStaging/application_1472396426409_0007/py4j-0.10.1-src.zip

然而,该应用程序无法运行,报告缺少py4j库? :

6/08/29 20:48:47 INFO Client: Application report for application_1472396426409_0007 (state: ACCEPTED)
16/08/29 20:48:48 INFO Client: Application report for application_1472396426409_0007 (state: FAILED)
16/08/29 20:48:48 INFO Client: 
     client token: N/A
     diagnostics: Application application_1472396426409_0007 failed 2 times due to AM Container for appattempt_1472396426409_0007_000002 exited with  exitCode: -1000
For more detailed output, check application tracking page:http://ip-xxx-xxx-xxx-xxx.ec2.internal:8088/cluster/app/application_1472396426409_0007Then, click on links to logs of each attempt.
Diagnostics: File does not exist: hdfs://ip-xxx-xxx-xxx-xxx.ec2.internal:8020/user/hadoop/.sparkStaging/application_1472396426409_0007/py4j-0.10.1-src.zip
java.io.FileNotFoundException: File does not exist: hdfs://ip-xxx-xxx-xxx-xxx.ec2.internal:8020/user/hadoop/.sparkStaging/application_1472396426409_0007/py4j-0.10.1-src.zip
    at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309)
    at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)

我是否滥用了这个命令?

1 个答案:

答案 0 :(得分:1)

这似乎是aws系统的一个错误。 Yarn监控系统并注意到已部署的代码不再存在 - 这实际上表明火花已完成处理。

要验证这是否是问题,请通过阅读应用程序的日志进行仔细检查 - 即,针对主节点运行类似的操作:

require

并仔细检查您是否在日志中看到成功消息:

yarn logs -applicationId application_1472396426409_0007