在Spark中获得键的平均值

时间:2015-06-22 02:01:07

标签: apache-spark

如何计算Spark中键的平均值?

1 个答案:

答案 0 :(得分:-1)

我们可以使用combineByKeyfoldByKey计算Spark中键的平均值。

foldByKey

foldByKey(initialValue)((initialValue,inputDataValue) => { //code })

输入数据:

employee,department,salary
e1,d1,100
e2,d1,500
e5,d2,200
e6,d1,300
e7,d3,200
e7,d3,500

1最后是计数。折叠输入类型和initialValue必须匹配

val depSalary = data.map(_.split(',')).map( x=> (x(1),(x(2).toInt,1)))   

val dummy = (0,0)
val depSalarySumCount = depSalary.foldByKey(dummy)((startValue,data)  => ( startValue._1 + data._1 , startValue._2 +data._2  ) )   

val result =  depSalarySumCount.map(x => (x._1, (x._2._1/x._2._2) ))
result.collect