将地图拆分为多个地图

时间:2018-07-11 13:46:05

标签: scala

我需要处理两个(巨大)Map之间的差异。为了并行化任务,我想按Key哈希值拆分这2个Map,并创建较小的Map s(按哈希值的范围)。

我如何在(惯用的)Scala中归档?

1 个答案:

答案 0 :(得分:0)

下面是一个粗略的草图,可帮助您开始使用Scala语法:

// create two (slightly different) maps, print them as table side by side
val rnd = new util.Random
val originalMap1 = (0 to 10).map(i => (i, i * i)).toMap
val originalMap2 = (0 to 10).map(i => (i, i * i + rnd.nextInt(2))).toMap
for (i <- 0 to 10) {
  val a = originalMap1(i)
  val b = originalMap2(i)
  val marker = if (a == b) "" else " <-"
  println(s"$i: $a $b $marker")
}

//subdivide into smaller maps
val numSubmaps = 5
val submaps1 = originalMap1.groupBy(_._1.hashCode % numSubmaps)
val submaps2 = originalMap2.groupBy(_._1.hashCode % numSubmaps)

// compare each corresponding pair of maps separately, merge diffs
val diffs = (for (s <- 0 until numSubmaps) yield {
  val m1 = submaps1(s)
  val m2 = submaps2(s)
  for { 
    k <- m1.keys
    a = m1(k)
    b = m2(k)
    if a != b
  } yield (k, (a, b))
}).reduce(_ ++ _)

println(diffs.toList.sortBy(_._1))

输入:

0: 0 1  <-
1: 1 2  <-
2: 4 4 
3: 9 9 
4: 16 16 
5: 25 26  <-
6: 36 36 
7: 49 49 
8: 64 65  <-
9: 81 82  <-
10: 100 101  <-

输出:

List((0,(0,1)), (1,(1,2)), (5,(25,26)), (8,(64,65)), (9,(81,82)), (10,(100,101)))