Java / Spark - 按加权平均聚合分组

时间:2016-02-03 20:05:03

标签: java apache-spark

数据:

id | sector     | balance
---------------------------
1  | restaurant | 20000
2  | restaurant | 20000
3  | auto       | 10000
4  | auto       | 10000
5  | auto       | 10000

我希望将此加载为火花作为df并按余额总和计算组,但我还必须计算balace%对总余额(所有ID的总和(余额))

我该如何做到这一点?

1 个答案:

答案 0 :(得分:2)

要获得%对总数,您可以使用DoubleRDDFunctions:

val totalBalance = data.map(_._3.toDouble).sum()

val percentageRow = data.map(d => d._3 * 100 / totalBalance)

val percentageGroup = data.map(d => (d._2, d._3))
         .reduceByKey((x,y) => x+y).mapValues(sumGroup => sumGroup * 100 / totalBalance)