Spark对于死亡的工人似乎没有容错能力

时间:2017-02-22 09:50:06

标签: apache-spark apache-spark-mllib

我在AWS上有一个测试Spark集群(1台主机+5台工作机器,所有在m4.2xlarge实例上运行Spark 2.1.0和Scala 2.11.8),我正在运行在Spark shell中ALS demo code来测试性能。

我注意到当我终止所有工人机器时(一直保持主人),工作负载重新分配给剩余的工人,但当我杀死所有工人时,工作通常完全死亡而不是耐心地等待更多的工人一起来。这是正常行为吗?

我的shell会话如下。前几行是ALS应用程序,其余的是错误消息。您会注意到,我第一次杀死所有工作人员(执行人ID:0,1,2,3,4)时,shell等待更多工作人员上线,就像它应该的那样。一旦我提出了更多的工作人员(ID:10,11,12,13,14),应用程序就会继续运行。但是,当我终止这些新员工时,整个工作都以SparkException: Job aborted due to stage failure中止。

这是正常行为吗?如果没有,我做错了什么?如果是这样,我怎样才能提高Spark对(可能所有)工人死亡的容忍度?任何有关这方面的见解将不胜感激。

Spark context Web UI available at http://xxx.xxx.xxx.133:4040
Spark context available as 'sc' (master = spark://xxx.xxx.xxx.133:7077, app id = app-20170222012148-0005).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.1.0
      /_/

Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_92)
Type in expressions to have them evaluated.
Type :help for more information.

scala> :paste
// Entering paste mode (ctrl-D to finish)

import org.apache.spark.mllib.recommendation._

val data = sc.textFile("s3n://my.bucket/training-set.tsv")
val ratings = data.map(_.split('\t') match { case Array(user, item, rate) =>
  Rating(user.toInt, item.toInt, rate.toDouble)
})

val rank = 10
val numIterations = 10
val model = ALS.train(ratings, rank, numIterations, 0.01)

// Exiting paste mode, now interpreting.

[Stage 0:===========>                                             (5 + 19) / 24]
17/02/22 01:23:32 ERROR TaskSchedulerImpl: Lost executor 1 on xxx.xxx.xxx.174: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:23:32 WARN TaskSetManager: Lost task 11.0 in stage 0.0 (TID 11, xxx.xxx.xxx.174, executor 1): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:23:32 WARN TaskSetManager: Lost task 16.0 in stage 0.0 (TID 16, xxx.xxx.xxx.174, executor 1): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:23:32 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, xxx.xxx.xxx.174, executor 1): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:23:32 WARN TaskSetManager: Lost task 6.0 in stage 0.0 (TID 6, xxx.xxx.xxx.174, executor 1): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:23:32 WARN TransportChannelHandler: Exception in connection from /xxx.xxx.xxx.118:60180
java.io.IOException: Connection reset by peer
    at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
    at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
    at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
    at sun.nio.ch.IOUtil.read(IOUtil.java:192)
    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
    at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:221)
    at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:899)
    at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:275)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:652)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:575)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:489)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:451)
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:140)
    at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
    at java.lang.Thread.run(Thread.java:745)
17/02/22 01:23:32 ERROR TaskSchedulerImpl: Lost executor 0 on xxx.xxx.xxx.118: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:23:32 WARN TaskSetManager: Lost task 16.1 in stage 0.0 (TID 26, xxx.xxx.xxx.118, executor 0): ExecutorLostFailure (executor 0 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:23:32 WARN TaskSetManager: Lost task 4.0 in stage 0.0 (TID 4, xxx.xxx.xxx.118, executor 0): ExecutorLostFailure (executor 0 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:23:32 WARN TaskSetManager: Lost task 9.0 in stage 0.0 (TID 9, xxx.xxx.xxx.118, executor 0): ExecutorLostFailure (executor 0 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:23:32 ERROR TaskSchedulerImpl: Lost executor 2 on xxx.xxx.xxx.253: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:23:32 WARN TaskSetManager: Lost task 23.0 in stage 0.0 (TID 23, xxx.xxx.xxx.253, executor 2): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:23:32 WARN TaskSetManager: Lost task 8.0 in stage 0.0 (TID 8, xxx.xxx.xxx.253, executor 2): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:23:32 WARN TaskSetManager: Lost task 13.0 in stage 0.0 (TID 13, xxx.xxx.xxx.253, executor 2): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:23:32 WARN TaskSetManager: Lost task 1.1 in stage 0.0 (TID 25, xxx.xxx.xxx.253, executor 2): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:23:32 WARN TaskSetManager: Lost task 18.0 in stage 0.0 (TID 18, xxx.xxx.xxx.253, executor 2): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:23:32 WARN TaskSetManager: Lost task 3.0 in stage 0.0 (TID 3, xxx.xxx.xxx.253, executor 2): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:23:32 WARN TaskSetManager: Lost task 4.1 in stage 0.0 (TID 30, xxx.xxx.xxx.253, executor 2): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:23:32 WARN TaskSetManager: Lost task 14.1 in stage 0.0 (TID 33, xxx.xxx.xxx.253, executor 2): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:23:32 ERROR TaskSchedulerImpl: Lost executor 4 on xxx.xxx.xxx.200: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:23:32 WARN TaskSetManager: Lost task 19.1 in stage 0.0 (TID 32, xxx.xxx.xxx.200, executor 4): ExecutorLostFailure (executor 4 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:23:32 WARN TaskSetManager: Lost task 9.1 in stage 0.0 (TID 29, xxx.xxx.xxx.200, executor 4): ExecutorLostFailure (executor 4 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:23:32 WARN TaskSetManager: Lost task 20.0 in stage 0.0 (TID 20, xxx.xxx.xxx.200, executor 4): ExecutorLostFailure (executor 4 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:23:32 WARN TaskSetManager: Lost task 5.0 in stage 0.0 (TID 5, xxx.xxx.xxx.200, executor 4): ExecutorLostFailure (executor 4 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:23:32 WARN TaskSetManager: Lost task 10.0 in stage 0.0 (TID 10, xxx.xxx.xxx.200, executor 4): ExecutorLostFailure (executor 4 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:23:32 WARN TaskSetManager: Lost task 21.1 in stage 0.0 (TID 28, xxx.xxx.xxx.200, executor 4): ExecutorLostFailure (executor 4 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:23:32 WARN TaskSetManager: Lost task 15.0 in stage 0.0 (TID 15, xxx.xxx.xxx.200, executor 4): ExecutorLostFailure (executor 4 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:23:32 WARN TaskSetManager: Lost task 6.1 in stage 0.0 (TID 24, xxx.xxx.xxx.200, executor 4): ExecutorLostFailure (executor 4 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:23:33 ERROR TaskSchedulerImpl: Lost executor 3 on xxx.xxx.xxx.136: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:23:33 WARN TaskSetManager: Lost task 4.2 in stage 0.0 (TID 35, xxx.xxx.xxx.136, executor 3): ExecutorLostFailure (executor 3 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:23:33 WARN TaskSetManager: Lost task 17.0 in stage 0.0 (TID 17, xxx.xxx.xxx.136, executor 3): ExecutorLostFailure (executor 3 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:23:33 WARN TaskSetManager: Lost task 2.0 in stage 0.0 (TID 2, xxx.xxx.xxx.136, executor 3): ExecutorLostFailure (executor 3 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:23:33 WARN TaskSetManager: Lost task 16.2 in stage 0.0 (TID 31, xxx.xxx.xxx.136, executor 3): ExecutorLostFailure (executor 3 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:23:33 WARN TaskSetManager: Lost task 7.0 in stage 0.0 (TID 7, xxx.xxx.xxx.136, executor 3): ExecutorLostFailure (executor 3 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:23:33 WARN TaskSetManager: Lost task 14.2 in stage 0.0 (TID 34, xxx.xxx.xxx.136, executor 3): ExecutorLostFailure (executor 3 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:23:33 WARN TaskSetManager: Lost task 11.1 in stage 0.0 (TID 27, xxx.xxx.xxx.136, executor 3): ExecutorLostFailure (executor 3 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:23:33 WARN TaskSetManager: Lost task 12.0 in stage 0.0 (TID 12, xxx.xxx.xxx.136, executor 3): ExecutorLostFailure (executor 3 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
[Stage 0:==============================>                         (13 + 11) / 24]
17/02/22 01:26:29 ERROR TaskSchedulerImpl: Lost executor 13 on xxx.xxx.xxx.136: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:26:29 ERROR TaskSchedulerImpl: Lost executor 14 on xxx.xxx.xxx.200: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:26:29 ERROR TaskSchedulerImpl: Lost executor 11 on xxx.xxx.xxx.118: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:26:29 WARN TaskSetManager: Lost task 20.1 in stage 0.0 (TID 50, xxx.xxx.xxx.118, executor 11): ExecutorLostFailure (executor 11 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:26:29 WARN TaskSetManager: Lost task 5.1 in stage 0.0 (TID 49, xxx.xxx.xxx.118, executor 11): ExecutorLostFailure (executor 11 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:26:29 WARN TaskSetManager: Lost task 6.2 in stage 0.0 (TID 45, xxx.xxx.xxx.118, executor 11): ExecutorLostFailure (executor 11 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:26:29 WARN TaskSetManager: Lost task 10.1 in stage 0.0 (TID 48, xxx.xxx.xxx.118, executor 11): ExecutorLostFailure (executor 11 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:26:29 WARN TaskSetManager: Lost task 9.2 in stage 0.0 (TID 51, xxx.xxx.xxx.118, executor 11): ExecutorLostFailure (executor 11 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
[Stage 0:==============================>                          (13 + 8) / 24]
17/02/22 01:26:30 ERROR TaskSchedulerImpl: Lost executor 10 on xxx.xxx.xxx.174: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:26:30 WARN TaskSetManager: Lost task 2.1 in stage 0.0 (TID 41, xxx.xxx.xxx.174, executor 10): ExecutorLostFailure (executor 10 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:26:30 WARN TaskSetManager: Lost task 14.3 in stage 0.0 (TID 38, xxx.xxx.xxx.174, executor 10): ExecutorLostFailure (executor 10 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:26:30 WARN TaskSetManager: Lost task 16.3 in stage 0.0 (TID 40, xxx.xxx.xxx.174, executor 10): ExecutorLostFailure (executor 10 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:26:30 ERROR TaskSetManager: Task 16 in stage 0.0 failed 4 times; aborting job
17/02/22 01:26:30 WARN TaskSetManager: Lost task 4.3 in stage 0.0 (TID 43, xxx.xxx.xxx.174, executor 10): ExecutorLostFailure (executor 10 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:26:30 WARN TaskSetManager: Lost task 11.2 in stage 0.0 (TID 37, xxx.xxx.xxx.174, executor 10): ExecutorLostFailure (executor 10 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:26:30 WARN TaskSetManager: Lost task 9.3 in stage 0.0 (TID 60, xxx.xxx.xxx.174, executor 10): ExecutorLostFailure (executor 10 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:26:30 WARN TaskSetManager: Lost task 12.1 in stage 0.0 (TID 36, xxx.xxx.xxx.174, executor 10): ExecutorLostFailure (executor 10 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:26:30 WARN TaskSetManager: Lost task 7.1 in stage 0.0 (TID 39, xxx.xxx.xxx.174, executor 10): ExecutorLostFailure (executor 10 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:26:30 ERROR TaskSchedulerImpl: Lost executor 12 on xxx.xxx.xxx.253: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:26:30 WARN TaskSetManager: Lost task 6.3 in stage 0.0 (TID 62, xxx.xxx.xxx.253, executor 12): ExecutorLostFailure (executor 12 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:26:30 WARN TaskSetManager: Lost task 1.2 in stage 0.0 (TID 56, xxx.xxx.xxx.253, executor 12): ExecutorLostFailure (executor 12 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:26:30 WARN TaskSetManager: Lost task 20.2 in stage 0.0 (TID 64, xxx.xxx.xxx.253, executor 12): ExecutorLostFailure (executor 12 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:26:30 WARN TaskSetManager: Lost task 8.1 in stage 0.0 (TID 58, xxx.xxx.xxx.253, executor 12): ExecutorLostFailure (executor 12 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:26:30 WARN TaskSetManager: Lost task 10.2 in stage 0.0 (TID 61, xxx.xxx.xxx.253, executor 12): ExecutorLostFailure (executor 12 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:26:30 WARN TaskSetManager: Lost task 5.2 in stage 0.0 (TID 63, xxx.xxx.xxx.253, executor 12): ExecutorLostFailure (executor 12 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:26:30 WARN TaskSetManager: Lost task 3.1 in stage 0.0 (TID 54, xxx.xxx.xxx.253, executor 12): ExecutorLostFailure (executor 12 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
17/02/22 01:26:30 WARN TaskSetManager: Lost task 13.1 in stage 0.0 (TID 57, xxx.xxx.xxx.253, executor 12): ExecutorLostFailure (executor 12 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 16 in stage 0.0 failed 4 times, most recent failure: Lost task 16.3 in stage 0.0 (TID 40, xxx.xxx.xxx.174, executor 10): ExecutorLostFailure (executor 10 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
Driver stacktrace:
  at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422)
  at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
  at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
  at scala.Option.foreach(Option.scala:257)
  at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802)
  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1650)
  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605)
  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594)
  at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
  at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:1918)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:1931)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:1944)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:1958)
  at org.apache.spark.rdd.RDD.count(RDD.scala:1157)
  at org.apache.spark.ml.recommendation.ALS$.train(ALS.scala:694)
  at org.apache.spark.mllib.recommendation.ALS.run(ALS.scala:253)
  at org.apache.spark.mllib.recommendation.ALS$.train(ALS.scala:340)
  at org.apache.spark.mllib.recommendation.ALS$.train(ALS.scala:357)
  ... 53 elided

scala>

0 个答案:

没有答案