无法了解Spark中的fold()行为

时间:2018-06-30 16:15:52

标签: apache-spark

我是新来的火花。我已经执行了以下Spark程序

val spark = SparkSession.builder().appName("FoldFunction").master("local").getOrCreate()
    val data = spark.sparkContext.parallelize(List(("Maths", 10), ("English", 10), ("Social", 10), ("Science",10)))
    val extraMarks = ("extra", 10)
    val foldedData = data.fold(extraMarks){ (acc, marks) => val add = acc._2 + marks._2
      ("total", add)}

    println(foldedData)

根据我的分析,代码会将10分加到总分中。但是我得到的答案是(total,60)

有人可以解释我的分析方法是否正确吗?

1 个答案:

答案 0 :(得分:0)

api文档显示以下内容

  

* @param zeroValue the initial value for the accumulated result of each partition for the op operator, and also the initial value for the combine results from different partitions for the op operator - this will typically be the neutral element (e.g.for list concatenation or 0 for summation) * @param op an operator used to both accumulate results within a partition and combine results from different partitions */ def fold(zeroValue: T)(op: (T, T) => T): T

通常将zeroValue设置为0Nil

但是您的zeroValue("extra", 10),它是在上一次累积过程中再次添加的,这就是您如何获得(total,60)

让我们一步一步走

起初acc(extra,10) marks(Maths,10),所以 10 + 10 = 20 (total, 20)
第二个acc(total,20) marks(English,10),所以 20 + 10 = 30 (total, 30)
第三个acc(total,30) marks(Social,10),所以 30 + 10 = 40 (total, 40)
第四个acc(total,40) marks(Science,10),所以 40 + 10 = 50 (total, 50)
累积将zeroValue (extra,10)folded (total,50)相加,因此 10 + 50 = 60 (total, 60)