Question

我有一个在独立模式下正常运行的spark应用程序，我现在正试图让相同的应用程序在AWS EMR集群上运行，但目前它正在失败。

这条消息是我以前从未见过的，暗示工人没有接受工作并被关闭。

    **16/11/30 14:45:00 INFO ExecutorAllocationManager: Removing executor 3 because it has been idle for 60 seconds (new desired total will be 7)
16/11/30 14:45:00 INFO YarnClientSchedulerBackend: Requesting to kill executor(s) 2
16/11/30 14:45:00 INFO ExecutorAllocationManager: Removing executor 2 because it has been idle for 60 seconds (new desired total will be 6)
16/11/30 14:45:00 INFO YarnClientSchedulerBackend: Requesting to kill executor(s) 4
16/11/30 14:45:00 INFO ExecutorAllocationManager: Removing executor 4 because it has been idle for 60 seconds (new desired total will be 5)
16/11/30 14:45:01 INFO YarnClientSchedulerBackend: Requesting to kill executor(s) 7
16/11/30 14:45:01 INFO ExecutorAllocationManager: Removing executor 7 because it has been idle for 60 seconds (new desired total will be 4)**

DAG显示工人初始化，然后收集（一个相对较小的收集），然后在它们全部失败后不久。动态分配已启用，因此有人认为驱动程序可能没有向他们发送任何任务，因此他们超时 - 证明我在没有动态分配的情况下旋转另一个群集的理论，同样的事情也发生了。

主人设置为纱线。

非常感谢任何帮助，谢谢。

    16/11/30 14:49:16 INFO BlockManagerMaster: Removal of executor 21 requested
16/11/30 14:49:16 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asked to remove non-existent executor 21
16/11/30 14:49:16 INFO BlockManagerMasterEndpoint: Trying to remove executor 21 from BlockManagerMaster.
16/11/30 14:49:24 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1480517110174_0001_01_000049 on host: ip-10-138-114-125.ec2.internal. Exit status: 1. Diagnostics: Exception from container-launch.
Container id: container_1480517110174_0001_01_000049
Exit code: 1
Stack trace: ExitCodeException exitCode=1: 
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
    at org.apache.hadoop.util.Shell.run(Shell.java:456)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

我的步骤非常简单 - spark-submit --deploy-mode client --master yarn --class Run app.jar

Spark EMR Cluster在运行时删除执行程序，因为它们处于空闲状态

0 个答案: