hadoop reducer的实现是否依赖大哈希图来对所有相同的键进行重复处理?

时间:2018-06-20 22:58:15

标签: hadoop mapreduce

hadoop map reduce reducer的实现是什么样的?

例如,提供以下文件

k1 -> v1
k1 -> v2
k1 -> v3

实现reducer的一种简单方法是

0. init an empty map
1. receive k1 -> v1
2. since k1 does not exist in map put k1->v1 into map with key k1
3. receive k1 -> v2
4. k1 is already in map, fetch k1 -> v1 in map and dedup it with k1 -> v2 and store it
4. k1 is already in map, fetch k1 -> v3 in map and dedup it with k1 in the amp and store the result

返回地图

这需要O(cardinality(k))的内存才能存储地图。

这似乎效率很低。

这是减速器的实现方式吗?

0 个答案:

没有答案