Question

在reduce（）和cleanup（）中执行context.write（）之间的区别？我在某处读过只有在temp_dir中的输出移动到指定的输出目录后才会调用清理？

如何在MR作业中使用treeMap和一些例子？

Answer 1

reducer任务有以下方法，如下所示：

run():
 setup()
 for each record:
    reduce()
 cleanup()

因为你可以看到setup（）和cleanup（）每个reducer任务只被调用一次，而reduce（）被调用每个记录（记录是一个键和值）。

在reducer中，一次只有一个键及其值。在清理中，您可以累积所有值（来自每个reducer）并进行一些处理并发出输出。

示例：

reducer task:
  setup: create a map (hash or tree)
  for each reducer: store key, values in map 
  cleanup() : use the map and emit the key or values or both you are interested in.

警告：如果内部结构中存储的数据太多（如此处的树形图），则可能会遇到正在运行reduce任务的计算机的内存限制。

在reducer的cleanup（）中调用context.write（）有什么用？

1 个答案: