提交后,SparkAppHandle状态为LOST,但驱动程序运行正常

时间:2019-06-04 10:44:40

标签: java apache-spark

我正在使用spark java API将驱动程序提交到本地Spark集群(1个主+ 1个工作器)。 调用带有附加侦听器的startApplication之后,对stateChanged的第一次调用将给出LOST状态。

驱动程序提交正常,并且在工作程序中运行正常。

我尝试使用等待循环而不是监听器。

我尝试使用Spark版本2.3.1和2.4.3。

我已经在OSX和Ubuntu中尝试过。

我尝试将Spark Master主机更改为机器的IP,而不是名称。

SparkLauncher launcher = new SparkLauncher(env)
    .setAppResource(path)
    .setMainClass("full.package.name.RTADriver")
    .setMaster("spark://" + sparkMasterHost + ":" + sparkMasterPort)
    .setAppName("rta_scala_app_")
    .setDeployMode("cluster")
    .setConf("spark.ui.enabled", "true")
    .addAppArgs(runnerStr)
    .setVerbose(true);

SparkAppHandle handle = launcher.startApplication();

while (!handle.getState().equals(SparkAppHandle.State.FINISHED)){
    System.out.println("Wait Loop: App_ID: " + handle.getAppId() + " state: " +  handle.getState());
    Thread.sleep(10000);
}

System.out在我的代码上的日志:

First State App_ID: null state: UNKNOWN
Wait Loop: App_ID: null state: UNKNOWN
Wait Loop: App_ID: null state: LOST
Wait Loop: App_ID: null state: LOST
...

重要的Spark提交日志:

INFO: 19/06/04 11:27:54 INFO Utils: Successfully started service 'driverClient' on port 52077.
INFO: 19/06/04 11:27:54 INFO TransportClientFactory: Successfully created connection to /10.10.0.179:7077 after 34 ms (0 ms spent in bootstraps)
INFO: 19/06/04 11:27:54 INFO ClientEndpoint: Driver successfully submitted as driver-20190604112754-0030
INFO: 19/06/04 11:27:54 INFO ClientEndpoint: ... waiting before polling master for driver state
INFO: 19/06/04 11:27:59 INFO ClientEndpoint: ... polling master for driver state
INFO: 19/06/04 11:27:59 INFO ClientEndpoint: State of driver-20190604112754-0030 is RUNNING
INFO: 19/06/04 11:27:59 INFO ClientEndpoint: Driver running on 10.10.0.179:49705 (worker-20190603154544-10.10.0.179-49705)
INFO: 19/06/04 11:27:59 INFO ShutdownHookManager: Shutdown hook called
INFO: 19/06/04 11:27:59 INFO ShutdownHookManager: Deleting directory /private/var/folders/90/pgndgkk11lj0qb4q5qw_f03c0000gn/T/spark-8d8d92b9-8d0c-43a1-8bb9-3d08f1519c53
Wait Loop: App_ID: null state: LOST
...

1 个答案:

答案 0 :(得分:1)

我刚刚遇到了同样的情况。我的猜测是由于部署模式为“集群”,Spark驱动程序进程在另一个具有Spark Launcher进程的主机中运行;因此,启动程序会与Spark应用“失去”连接。