我在一个简单的火花工作中遇到了这个错误。在调查原因之前,我不想增加spark.akka.frameSize。
spark reduceByKey操作错误 -
16/07/04 19:50:43 ERROR MapOutputTrackerMasterEndpoint: Map output statuses were 209098002 bytes which exceeds spark.akka.frameSize (104857600 bytes).
16/07/04 19:50:46 ERROR MapOutputTrackerMasterEndpoint: Map output statuses were 209098002 bytes which exceeds spark.akka.frameSize (104857600 bytes).
16/07/04 19:50:49 ERROR MapOutputTrackerMasterEndpoint: Map output statuses were 209098002 bytes which exceeds spark.akka.frameSize (104857600 bytes).
16/07/04 19:50:49 ERROR MapOutputTrackerMasterEndpoint: Map output statuses were 209098002 bytes which exceeds spark.akka.frameSize (104857600 bytes).
代码 -
val events = inputFilesRdd.map {
// Emit : (id, user, product) -> date
splits => (splits(11), splits(4), splits(2)) -> splits(1)
}.reduceByKey((left, right) => minString(left, right))
minString是一个比较字符串的简单函数。预期的数据大小是更高的Gigs。
输入文件是gzip压缩文件。
有关此错误的任何提示?