Question

treeAggregate如何为最大，最小的函数工作？我有两个场景，其中一个可以工作，但在其他情况下没有。

scala> val z = sc.parallelize(List(1, 2, 3, 4, 5, 6), 2)
z: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at <console>:24


scala> z.treeAggregate(0)(
 | math.max(_, _), math.max(_, _)
 | )
res0: Int = 6


scala> z.treeAggregate(0)(
     |       seqOp = (U, v) => {
     |         math.max(U, v)
     |         U
     |       },
     |       combOp = (U1, U2) => {
     |         math.max(U1, U2)
     |         U1
     |       })
res1: Int = 0

Answer 1

表达结果

{
   math.max(U, v)
   U
}

是U，所以整个第二个构造基本上选择RDD的最左边元素，将代码更改为

z.treeAggregate(0)(
 seqOp = (u, v) => {
   math.max(u, v)
 },
 combOp = (u1, u2) => {
    math.max(u1, u2)
 })

使其与第一个聚合表达式

类似

TreeAggregate在Spark中

1 个答案: