Question

My Mapper任务返回以下输出：

我已经编写了缩减器代码和keycomparator来生成正确的输出，但是如何获得Mapper输出的前3位（按计数排名前N位）：

public static class WLReducer2 extends
        Reducer<IntWritable, Text, Text, IntWritable> {

    @Override
    protected void reduce(IntWritable key, Iterable<Text> values,
            Context context) throws IOException, InterruptedException {

        for (Text x : values) {
            context.write(new Text(x), key);
        }

    };

}

public static class KeyComparator extends WritableComparator {
    protected KeyComparator() {
        super(IntWritable.class, true);
    }

    @Override
    public int compare(WritableComparable w1, WritableComparable w2) {
        // TODO Auto-generated method stub

        // Logger.error("--------------------------> writing Keycompare data = ----------->");
        IntWritable ip1 = (IntWritable) w1;
        IntWritable ip2 = (IntWritable) w2;
        int cmp = -1 * ip1.compareTo(ip2);

        return cmp;
    }
}

这是减速器输出：

减速器的预期输出按计数排在前3位，即：

r   6
b   3
a   3

Answer 1

限制减速器的输出。这样的事情。

public static class WLReducer2 extends
        Reducer<IntWritable, Text, Text, IntWritable> {
    int count=0;
    @Override
    protected void reduce(IntWritable key, Iterable<Text> values,
            Context context) throws IOException, InterruptedException {

        for (Text x : values) {
            if (count > 3)
            context.write(new Text(x), key);
            count++;
        }

    };
}

将减速器数量设置为1. job.setNumReduceTasks(1)。

Answer 2

如果您的前N个元素可以存储在内存中，您可以使用TreeMap存储前N个元素，如果您的过程只能使用一个reducer 进行聚合。

在reducer的 setup（）方法中实例化实例变量TreeMap。
在 reducer（）方法中，您应该聚合键组的所有值，然后将结果与树中的第一个（最低）键map.firstKey()进行比较。如果当前值大于树中的最小值，则将当前值插入树形图map.put(value, Item)，然后从树map.remove(value)中删除最低值。
在reducer的 cleanup（）方法中，按所需顺序将所有TreeMap元素写入输出。

注意值来比较您的记录必须是TreeMap中的键。 TreeMap的值应该是描述，标记，字母等;与数字相关。

从mapper输出中获取前N项 - Mapreduce

2 个答案: