发送有序键以纠正减速器

时间:2014-10-25 17:46:47

标签: hadoop

如果我有0到19的键,我想创建一个分区器,它将键0和1的记录发送到第一个reducer,2和3发送到第二个reducer,依此类推。有没有办法做到这一点?

2 个答案:

答案 0 :(得分:1)

在所有情况下,您需要事先知道减速器的数量。根据您的问题的理解程度,该解决方案也非常通用,

查看您的问题,会出现以下序列,

Reducer0 keys 0,1|Reducer1 keys 2,3|Reducer2 keys 4,5|Reducer3 keys 6,7|
Reducer4 keys 8,9|Reducer5 keys 10,11|Reducer6 keys 12,13|Reducer7 keys 14,15|
Reducer8 keys 16,17|Reducer9 keys 18,19

在这种情况下,获取分区器中键的整数值

将reducer设置为key / 2

如果Key为13,则reducer将为13/2 = 6  如果key为14,则reducer将为14/2 = 7

 public static class CustomPartitioner extends Partitioner<IntWritable, Whatever> {

        @Override
        public int getPartition(IntWritable key, Whatever value, int numReduceTasks) {

           int keyAsInteger = key.get();
       return keyAsInteger/2;

        }
    }

答案 1 :(得分:0)

试试这个:

public class MyPartitioner extends Partitioner<Text, Text> {
static int[] number = {0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 9, 9, 10, 10};
    @Override
    public int getPartition(IntWritable key, Text value, int numReduceTasks) {
        return number[key];
    } 
}