我有一个RDD [(String,Map [String,Int])],
[("A",Map("acs"->2,"sdv"->2,"sfd"->1),("B",Map("ass"->2,"fvv"->2,"ffd"->1)),("A"),Map("acs"->2,"sdv"->2,"sfd"->1)]
我想用相同的键合并元素,
[("A",Map("acs"->4,"sdv"->4,"sfd"->2),("B",Map("ass"->2,"fvv"->2,"ffd"->1))]
如何在scala中做到这一点?
答案 0 :(得分:3)
如果您定义mapSum
(请参阅merge two maps and sum values)
def mapSum[T](map1: Map[T, Int], map2: Map[T, Int]): Map[T, Int] = map1 ++ map2.map{ case (k,v) => k -> (v + map1.getOrElse(k,0)) }
然后,您可以分组并减少(类似于您的其他问题):
@ rdd.groupBy(_._1).map(_._2.reduce((a, b) => (a._1, mapSum(a._2, b._2)))).collect
res11: Array[(String, Map[String, Int])] = Array(
("A", Map("acs" -> 4, "sdv" -> 4, "sfd" -> 2)),
("B", Map("ass" -> 2, "fvv" -> 2, "ffd" -> 1))
)
答案 1 :(得分:2)
一种有效的方法是使用reduceByKey
通过对匹配键的值求和来汇总Map
(在累加器中):
val rdd = sc.parallelize(Seq(
("A", Map("acs"->2, "sdv"->2, "sfd"->1)),
("B", Map("ass"->2, "fvv"->2, "ffd"->1)),
("A", Map("acs"->2, "sdv"->2, "sfd"->1))
))
rdd.reduceByKey( (acc, m) =>
acc ++ m.map{ case (k, v) => (k, acc.getOrElse(k, 0) + v) }
).collect
// res1: Array[(String, scala.collection.immutable.Map[String,Int])] = Array(
// (A,Map(acs -> 4, sdv -> 4, sfd -> 2)),
// (B,Map(ass -> 2, fvv -> 2, ffd -> 1))
// )