我需要处理两个(巨大)Map
之间的差异。为了并行化任务,我想按Key哈希值拆分这2个Map
,并创建较小的Map
s(按哈希值的范围)。
我如何在(惯用的)Scala中归档?
答案 0 :(得分:0)
下面是一个粗略的草图,可帮助您开始使用Scala语法:
// create two (slightly different) maps, print them as table side by side
val rnd = new util.Random
val originalMap1 = (0 to 10).map(i => (i, i * i)).toMap
val originalMap2 = (0 to 10).map(i => (i, i * i + rnd.nextInt(2))).toMap
for (i <- 0 to 10) {
val a = originalMap1(i)
val b = originalMap2(i)
val marker = if (a == b) "" else " <-"
println(s"$i: $a $b $marker")
}
//subdivide into smaller maps
val numSubmaps = 5
val submaps1 = originalMap1.groupBy(_._1.hashCode % numSubmaps)
val submaps2 = originalMap2.groupBy(_._1.hashCode % numSubmaps)
// compare each corresponding pair of maps separately, merge diffs
val diffs = (for (s <- 0 until numSubmaps) yield {
val m1 = submaps1(s)
val m2 = submaps2(s)
for {
k <- m1.keys
a = m1(k)
b = m2(k)
if a != b
} yield (k, (a, b))
}).reduce(_ ++ _)
println(diffs.toList.sortBy(_._1))
输入:
0: 0 1 <-
1: 1 2 <-
2: 4 4
3: 9 9
4: 16 16
5: 25 26 <-
6: 36 36
7: 49 49
8: 64 65 <-
9: 81 82 <-
10: 100 101 <-
输出:
List((0,(0,1)), (1,(1,2)), (5,(25,26)), (8,(64,65)), (9,(81,82)), (10,(100,101)))