如果执行器节点突然死在火花流中,我该怎么办?

时间:2018-02-15 01:26:48

标签: apache-spark spark-streaming yarn

我正在使用版本1.6的spark spark。

几天前,我的火花流媒体应用(上下文)突然关闭。查看日志,其中一个执行程序似乎已关闭。 (设备实际上已关闭。)

如果发生这种情况我该怎么办? (请注意,动态分配选项不可用。)

如果关闭执行程序,我希望将作业本身分配给另一个执行程序。我的应用程序在纱线客户端模式下运行。

## log example, at the time of shutdown.

WARN TransportChannelHandler: Exception in connection from xxxx-hostname/12.34.56.789:12345
ERROR TransportResponseHandler: Still have 2 requests outstanding when connection from xxxx-hostname/12.34.56.789:12345 is closed
ERROR ContextCleaner: Error cleaning broadcast 1123293
WARN BlockManagerMaster: Failed to remove RDD 262104
...
ERROR TransportClient: Failed to send RPC 5940957964172608257 to xxxx-hostname/12.34.56.789:12345: java.nio.channels.ClosedChannelException
...
WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to get executor loss reason for executor id 5 at RPC address xxxx-hostname:12345, but got no response. Marking as slave lost. org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply in 120 seconds. This timeout is controlled by spark.rpc.askTimeout

1 个答案:

答案 0 :(得分:0)

您的 hdfs 文件系统空间(数据节点空间)即将用完。