Question

我无法理解带scala的第一个spark示例中的reduceByKey（_ + _）

object WordCount {
def main(args: Array[String]): Unit = {
val inputPath = args(0)
val outputPath = args(1)
val sc = new SparkContext()
val lines = sc.textFile(inputPath)
val wordCounts = lines.flatMap {line => line.split(" ")}
.map(word => (word, 1))
.reduceByKey(_ + _)  **I cant't understand this line**
wordCounts.saveAsTextFile(outputPath)
}
}

Answer 1

Reduce采用两个元素，并在将函数应用于两个参数后产生第三个元素。

您显示的代码等同于以下

Array ( 
  [0] => Array ( 
            [adress] => 0 
        )
  [1] => Array ( 
            [adress] => 6 
       )
  [2] => Array ( 
            [adress] => 15 
       )
    )

而不是定义虚拟变量并编写lambda，Scala非常聪明，可以确定您尝试实现的是在它接收的任何两个参数上应用reduceByKey((x,y)=> x + y)（在这种情况下为和），因此语法< / p>

func

Answer 2

reduceByKey接受两个参数，应用一个函数并返回

reduceByKey（_ + _）相当于reduceByKey（（x，y）=＆gt; x + y）

示例：

val numbers = Array(1, 2, 3, 4, 5)
val sum = numbers.reduceLeft[Int](_+_)

println("The sum of the numbers one through five is " + sum)

结果：

The sum of the numbers one through five is 15
numbers: Array[Int] = Array(1, 2, 3, 4, 5)
sum: Int = 15

相同的reduceByKey（_ ++ _）等同于reduceByKey（（x，y）=＆gt; x ++ y）

Spark Scala了解reduceByKey（_ + _）

2 个答案: