Question

我已将job配置为输出多个文件，如下所示：

        // in the run:
        // Defines additional single text based output 'text' for the job
        MultipleOutputs.addNamedOutput(job, "text", TextOutputFormat.class,
                LongWritable.class, Text.class);
        // Defines additional sequence-file based output 'sequence' for the job
        MultipleOutputs.addNamedOutput(job, "seq",
                SequenceFileOutputFormat.class, LongWritable.class, Text.class);

我还有多输出设置：

    private MultipleOutputs mos;

    public void setup(Context context) {
        mos = new MultipleOutputs(context);
    }

我正在尝试定义每个reducer的输出文件，以获得其处理信息的输入文件的名称：

        public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
        @Override
        public void reduce(Text word, Iterable<IntWritable> counts,
                Context context) throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable count : counts) {
                sum += count.get();
            }

            String filename = context.getFileClassPaths()[0].toString();
            Text key = new Text(filename);// new Text("");
            mos.write(word, new IntWritable(sum), generateFileName(key, new Text("value")));
        }

但是，文件名始终为null。任何人都可以指向最新版本的Hadoop的文档，或者请告诉我我做错了什么？我只能找到这个问题的答案，没有过时的API。

我也尝试了context.getInputSplit()，但它给出了编译错误......

hadoop在reducer中获取文件名

0 个答案: