hadoop在reducer中获取文件名

时间:2015-12-28 14:51:04

标签: java hadoop cloudera

我已将job配置为输出多个文件,如下所示:

        // in the run:
        // Defines additional single text based output 'text' for the job
        MultipleOutputs.addNamedOutput(job, "text", TextOutputFormat.class,
                LongWritable.class, Text.class);
        // Defines additional sequence-file based output 'sequence' for the job
        MultipleOutputs.addNamedOutput(job, "seq",
                SequenceFileOutputFormat.class, LongWritable.class, Text.class);

我还有多输出设置:

    private MultipleOutputs mos;

    public void setup(Context context) {
        mos = new MultipleOutputs(context);
    }

我正在尝试定义每个reducer的输出文件,以获得其处理信息的输入文件的名称:

        public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
        @Override
        public void reduce(Text word, Iterable<IntWritable> counts,
                Context context) throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable count : counts) {
                sum += count.get();
            }

            String filename = context.getFileClassPaths()[0].toString();
            Text key = new Text(filename);// new Text("");
            mos.write(word, new IntWritable(sum), generateFileName(key, new Text("value")));
        }

但是,文件名始终为null。任何人都可以指向最新版本的Hadoop的文档,或者请告诉我我做错了什么?我只能找到这个问题的答案,没有过时的API。

我也尝试了context.getInputSplit(),但它给出了编译错误......

0 个答案:

没有答案