Spark:在map函数中获取StackOverflowError

时间:2016-04-13 15:07:33

标签: scala serialization apache-spark rdd

这是我的代码。试图计算地图功能内的范围权重。每个权重都存储在CountsAsMap中。

val countsAsMap:Map[Int,Int] = counts.collectAsMap
// countsAsMap: scala.collection.Map[Int,Int] = Map(137 -> 91, 146 -> 83, 218 -> 26, 227 -> 16, ...)
var rangeMatrix = MutableList[(Int, Int)]()
for( i:Int <- min to max;
     j:Int <- min to max) {
    if (i <=j) {
        rangeMatrix += ((i, j))
    }
}
// rangeMatrix : ((301,301), (300,301), (300,300), (299,301), (299,300), ...)
// Creating parallelizable RDD for rangeMatrix
var matrixRDD = sc.parallelize(rangeMatrix)
val rangeWeight = matrixRDD.map(r => {
    var total = 0
    for( k <- r._1 to r._2) {
        total = total + countsAsMap(k)
    }
    total
})
rangeWeight.take(1).foreach(println)

跑步时出错。尝试了多种方法但最终都出现了以下异常

java.lang.StackOverflowError
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1108)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)

注意:如果我使用集合(List)进行转换(使用相同的地图),它的工作正常,而不是RDD。

0 个答案:

没有答案