为什么Spark因FetchFailed错误而失败?

时间:2016-03-10 13:37:39

标签: scala apache-spark mesos apache-zeppelin

我在Apache Mesos上使用Apache Zeppelin,共有4个节点,共210 GB。

My Spark工作,它在一个小的事务数据集和一个大型事件数据集之间进行关联。我想根据时间和ID(事件时间和交易时间,ID和ID)将每笔交易与最近的事件相匹配。

我收到以下错误:

FetchFailed(null, shuffleId=1, mapId=-1, reduceId=20,
  message=org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 1
    at org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker$$convertMapStatuses$2.apply(MapOutputTracker.scala:542)
    at org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker$$convertMapStatuses$2.apply(MapOutputTracker.scala:538)
    at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
    at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
    at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
    at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
    at org.apache.spark.MapOutputTracker$.org$apache$spark$MapOutputTracker$$convertMapStatuses(MapOutputTracker.scala:538)
    at org.apache.spark.MapOutputTracker.getMapSizesByExecutorId(MapOutputTracker.scala:155)
    at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:47)
    at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:98)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
    at org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$2.apply(CoGroupedRDD.scala:140)
    at org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$2.apply(CoGroupedRDD.scala:136)
    at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
    at scala.collection.immutable.List.foreach(List.scala:318)
    at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
    at org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:136)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

这是我的算法

val groupRDD = event
    .map { evt => ((evt.id, evt.date_time.toString.dropRight(8)), cdr) }
    .groupByKey(new HashPartitioner(128))
    .persist(StorageLevel.MEMORY_AND_DISK_SER)
val joinedRDD = groupRDD.rightOuterJoin {
    transactions.keyBy { transac => (transac.id, transac.dateTime.toString.dropRight(8)) }}
val result = joinedRDD.mapValues { case(a,b) => 
    val goodTransac = a.getOrElse(List(GeoLoc("",0L,"","","","","")))
        .reduce((v1,v2) => minDelay(b.dateTime,v1,v2))
    SomeClass(b.id, b....., goodTransac.date_time,.....)
}

groupByKey不应该分组太多元素(每个密钥最多50个)。

我注意到当内存太短时发生了错误,因此我决定在RAM和磁盘上继续序列化,然后将序列化程序更改为Kryo。我还将spark.memory.storageFraction缩减为0.2,以便为处理留出更多空间。

当我检查Web UI时,我发现GC在处理过程中花费的时间越来越多。当作业最终失败时,GC在运行时间的22分钟内需要20分钟,但不会在所有工作人员上运行。

我已经审核了Why do Spark jobs fail with org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 0 in speculation mode?,但我的群集仍然有足够的内存 - 大约有90 GB可用于Mesos。

1 个答案:

答案 0 :(得分:0)

我要做的是检查event RDD和groupByKey之后的分区数。使用RDD.getNumPartitions

使用StorageLevel.MEMORY_AND_DISK_SER将需要更多IO,这可能会降低执行程序的速度,并且SER可能会导致更长的GC(毕竟,数据集在内存中,并且必须序列化几乎两倍记忆要求)。

我强烈建议此时使用MEMORY_AND_DISK_SER

我还要查看result RDD的依赖关系图,看看每个阶段使用了多少shuffle和分区。

result.toDebugString

很少有地方可以出错。

P.S。从Web UI的作业,阶段,存储和执行器页面附加屏幕截图对于缩小根本原因非常有帮助。