hadoop map reduce reducer的实现是什么样的?
例如,提供以下文件
k1 -> v1
k1 -> v2
k1 -> v3
实现reducer的一种简单方法是
0. init an empty map
1. receive k1 -> v1
2. since k1 does not exist in map put k1->v1 into map with key k1
3. receive k1 -> v2
4. k1 is already in map, fetch k1 -> v1 in map and dedup it with k1 -> v2 and store it
4. k1 is already in map, fetch k1 -> v3 in map and dedup it with k1 in the amp and store the result
返回地图
这需要O(cardinality(k))
的内存才能存储地图。
这似乎效率很低。
这是减速器的实现方式吗?