映射输出状态是N个字节,超过spark.akka.frameSize - Spark中的错误

时间:2016-07-05 03:55:16

标签: apache-spark akka

我在一个简单的火花工作中遇到了这个错误。在调查原因之前,我不想增加spark.akka.frameSize。

spark reduceByKey操作错误 -

16/07/04 19:50:43 ERROR MapOutputTrackerMasterEndpoint: Map output statuses were 209098002 bytes which exceeds spark.akka.frameSize (104857600 bytes).
16/07/04 19:50:46 ERROR MapOutputTrackerMasterEndpoint: Map output statuses were 209098002 bytes which exceeds spark.akka.frameSize (104857600 bytes).
16/07/04 19:50:49 ERROR MapOutputTrackerMasterEndpoint: Map output statuses were 209098002 bytes which exceeds spark.akka.frameSize (104857600 bytes).
16/07/04 19:50:49 ERROR MapOutputTrackerMasterEndpoint: Map output statuses were 209098002 bytes which exceeds spark.akka.frameSize (104857600 bytes).

代码 -

val events = inputFilesRdd.map {
      // Emit : (id, user, product) -> date
      splits => (splits(11), splits(4), splits(2)) -> splits(1)
    }.reduceByKey((left, right) => minString(left, right))

minString是一个比较字符串的简单函数。预期的数据大小是更高的Gigs。

输入文件是gzip压缩文件。

有关此错误的任何提示?

0 个答案:

没有答案