我正在使用像这样的MultipleOutput:
public int run(String[] args) throws Exception {
...
job1.setInputFormatClass(TextInputFormat.class);
job1.setOutputFormatClass(TextOutputFormat.class);
****MultipleOutputs.addNamedOutput(job1, "stopwords", TextOutputFormat.class, Text.class, IntWritable.class);****
...
}
在Reducer上
public static class ReduceWordCount extends Reducer<Text, IntWritable, Text, IntWritable> {
private MultipleOutputs<Text, IntWritable> mos;
@Override
public void setup(Context context) {
mos = new MultipleOutputs<Text, IntWritable>(context);
}
@Override
public void reduce(Text word, Iterable<IntWritable> counts, Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable count : counts) {
sum += count.get();
}
if(sum>4000){
context.write(word, new IntWritable(sum));
mos.write("stopwords", new Text(word+", "), sum, "stopwords.csv");
}
}
protected void cleanup(Context context) throws IOException, InterruptedException {
mos.close();
}
}
我得到的输出文件是stopwords.csv-r-00000 我需要摆脱-r-00000。我怎样才能做到这一点?
答案 0 :(得分:0)
对于他可能关注的人我找到了答案here,他在工作完成后重命名文件
FileSystem hdfs = FileSystem.get(getConf());
FileStatus fs[] = hdfs.listStatus(new Path(args[1]));
if (fs != null){
for (FileStatus aFile : fs) {
if (!aFile.isDir()) {
hdfs.rename(aFile.getPath(), new Path(aFile.getPath().toString()+".txt"));
}
}
}