Question

在map-reduce中，映射进程生成分区输出后。之后，reducer从不同的映射器中提取特定的分区数据。在这里，reducer如何知道它必须从映射器的输出中提取哪个数据分区？

Answer 1

由分区机制决定。默认分区机制是 哈希分区 。但您可以为此定义自定义分区程序。

自定义分区程序的示例： -

public class Custom_Partitioner extends Partitioner<Text, IntWritable> {
    @Override
    public int getPartition(Text key, IntWritable value, int numPartitions) {
        String myKey = key.toString().toLowerCase();
        if (myKey.startsWith("a")) {
            return 0;
        } else if (myKey.startsWith("e")) {
            return 1;
        } else {
            return 2;
        }
    }
}

在驱动程序代码中： -

job.setPartitionerClass(Custom_Partitioner.class);
job.setNumReduceTasks(3);

所以在这里，我们将reducer任务的数量设置为3，并根据自己的expalanatory编码分区器。谢谢。！

在mapreduce中，reducer如何找到要拉出的map输出的哪个分区

1 个答案: