我有一个简单的Spark脚本,并且希望通过步骤在EMR上执行。就是这样:
FileInDLK_ul = "s3://Bucket/something.csv.gz"
df_ul = spark.read.csv(FileInDLK_ul, header=True)
df_ul.repartition(10).write.format("parquet").save("s3://AnotherBucket")
当我通过zeeplin对其进行测试时,它可以完美运行。
当我在EMR步骤上启动它时,它立即失败并显示:
20/03/04 17:16:36 INFO Client: Application report for application_1583330635514_0007 (state: ACCEPTED)
20/03/04 17:16:37 INFO Client: Application report for application_1583330635514_0007 (state: ACCEPTED)
20/03/04 17:16:38 INFO Client: Application report for application_1583330635514_0007 (state: FAILED)
20/03/04 17:16:38 INFO Client:
client token: N/A
diagnostics: Application application_1583330635514_0007 failed 1 times (global limit =2; local limit is =1) due to AM Container for appattempt_1583330635514_0007_000001 exited with exitCode: 13
Failing this attempt.Diagnostics: Exception from container-launch.
Container id: container_1583330635514_0007_01_000001
Exit code: 13
Stack trace: ExitCodeException exitCode=13:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:972)
at org.apache.hadoop.util.Shell.run(Shell.java:869)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:235)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
这就是我用作Step参数的地方:
我想念什么?