我已将job
配置为输出多个文件,如下所示:
// in the run:
// Defines additional single text based output 'text' for the job
MultipleOutputs.addNamedOutput(job, "text", TextOutputFormat.class,
LongWritable.class, Text.class);
// Defines additional sequence-file based output 'sequence' for the job
MultipleOutputs.addNamedOutput(job, "seq",
SequenceFileOutputFormat.class, LongWritable.class, Text.class);
我还有多输出设置:
private MultipleOutputs mos;
public void setup(Context context) {
mos = new MultipleOutputs(context);
}
我正在尝试定义每个reducer的输出文件,以获得其处理信息的输入文件的名称:
public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
@Override
public void reduce(Text word, Iterable<IntWritable> counts,
Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable count : counts) {
sum += count.get();
}
String filename = context.getFileClassPaths()[0].toString();
Text key = new Text(filename);// new Text("");
mos.write(word, new IntWritable(sum), generateFileName(key, new Text("value")));
}
但是,文件名始终为null。任何人都可以指向最新版本的Hadoop的文档,或者请告诉我我做错了什么?我只能找到这个问题的答案,没有过时的API。
我也尝试了context.getInputSplit()
,但它给出了编译错误......