如何在reducer中维护MapWritables的顺序?

时间:2014-06-19 21:37:32

标签: hadoop mapreduce writable

我的Mapper实现

public class SimpleMapper extends Mapper<Text, Text, Text, MapWritable> {

@Override
protected void map(Text key, Text value,Context context)
        throws IOException, InterruptedException {

            MapWritable writable = new LinkedMapWritable();
            writable.put("unique_key","one");
            writable.put("another_key","two");
            context.write(new Text("key"),writable );
        }

}

Reducer的实现是:

public class SimpleReducer extends Reducer<Text, MapWritable, NullWritable, Text> {
@Override
protected void reduce(Text key, Iterable<MapWritable> values,Context context)
        throws IOException, InterruptedException {

            // The map writables have to be ordered based on the "unique_key" inserted into it
        }

}

我必须使用二级排序吗?有没有其他方法可以这样做?

1 个答案:

答案 0 :(得分:1)

reducer中的MapWritable(values)始终处于不可预测的顺序,此顺序可能因运行而异,并且您无法控制它。

但Map / Reduce范例保证的是,为reducer提供的密钥将按排序顺序排列,属于单个密钥的所有值都将转到单个reducer。

因此,您绝对可以将二级排序和自定义分区程序用于您的用例。