无法在spark中运行collect

时间:2014-12-10 03:28:36

标签: apache-spark

我的应用程序失败如下。我想知道可能的原因。没有足够的内存可能导致这个? 没有问题在本地运行或在另一个较小的数据上运行

2014-12-09 21:51:47,830 WARN org.apache.spark.Logging$class.logWarning(Logging.scala:71) - Lost task 60.1 in stage 1.1 (TID 566, server-21): java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_4_piece0 of broadcast_4
org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:930)
org.apache.spark.broadcast.TorrentBroadcast.readObject(TorrentBroadcast.scala:155)
sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
java.lang.reflect.Method.invoke(Method.java:597)
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:969)
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1871)
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1775)
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1327)
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1969)
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1775)
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1327)
java.io.ObjectInputStream.readObject(ObjectInputStream.java:349)
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:160)
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
java.lang.Thread.run(Thread.java:662)
2014-12-09 21:51:49,460 INFO org.apache.spark.Logging$class.logInfo(Logging.scala:59) - Starting task 60.2 in stage 1.1 (TID 603, server-11, PROCESS_LOCAL, 1295 bytes)
2014-12-09 21:51:49,461 INFO org.apache.spark.Logging$class.logInfo(Logging.scala:59) - Lost task 9.3 in stage 1.1 (TID 579) on executor server-11: java.io.IOException (org.apache.spark.SparkException: Failed to get broadcast_4_piece0 of broadcast_4) [duplicate 1]
2014-12-09 21:51:49,487 ERROR org.apache.spark.Logging$class.logError(Logging.scala:75) - Task 9 in stage 1.1 failed 4 times; aborting job
2014-12-09 21:51:49,494 INFO org.apache.spark.Logging$class.logInfo(Logging.scala:59) - Cancelling stage 1
2014-12-09 21:51:49,498 INFO org.apache.spark.Logging$class.logInfo(Logging.scala:59) - Stage 1 was cancelled
2014-12-09 21:51:49,511 INFO org.apache.spark.Logging$class.logInfo(Logging.scala:59) - Failed to run collect at StatVideoService.scala:62

1 个答案:

答案 0 :(得分:1)

很可能其中一个执行程序内存耗尽而被杀 - 你需要检查执行程序的日志(和/或YARN的nodemanager,如果你在其中运行)