我无法理解带scala的第一个spark示例中的reduceByKey(_ + _)
object WordCount {
def main(args: Array[String]): Unit = {
val inputPath = args(0)
val outputPath = args(1)
val sc = new SparkContext()
val lines = sc.textFile(inputPath)
val wordCounts = lines.flatMap {line => line.split(" ")}
.map(word => (word, 1))
.reduceByKey(_ + _) **I cant't understand this line**
wordCounts.saveAsTextFile(outputPath)
}
}
答案 0 :(得分:13)
Reduce采用两个元素,并在将函数应用于两个参数后产生第三个元素。
您显示的代码等同于以下
Array (
[0] => Array (
[adress] => 0
)
[1] => Array (
[adress] => 6
)
[2] => Array (
[adress] => 15
)
)
而不是定义虚拟变量并编写lambda,Scala非常聪明,可以确定您尝试实现的是在它接收的任何两个参数上应用 reduceByKey((x,y)=> x + y)
(在这种情况下为和),因此语法< / p>
func
答案 1 :(得分:0)
reduceByKey接受两个参数,应用一个函数并返回
reduceByKey(_ + _)相当于reduceByKey((x,y)=&gt; x + y)
示例:
val numbers = Array(1, 2, 3, 4, 5)
val sum = numbers.reduceLeft[Int](_+_)
println("The sum of the numbers one through five is " + sum)
结果:
The sum of the numbers one through five is 15
numbers: Array[Int] = Array(1, 2, 3, 4, 5)
sum: Int = 15
相同的reduceByKey(_ ++ _)等同于reduceByKey((x,y)=&gt; x ++ y)