这是我的减速机。 Reducer接受EdgeWritable和NullWritable
EdgeWritable有4个整数,比如< 71,74,7,2000> 在2000年7月(7月)(年),通讯在71(FromID)到74(ToID)之间。
Mapper输出10787记录到reducer,但Reducer只输出1.
我需要在1998年10月到2002年7月期间输出44个文件,共44个月。输出格式为“out”+ month + year。对于2002年7月的例子,记录将在文件out72002中。
我调试了代码。在一次迭代之后,它输出一个文件并停止而不进行下一个记录。请建议我应该如何使用MultipleOutput。感谢
public class MultipleOutputReducer extends Reducer<EdgeWritable, NullWritable, IntWritable, IntWritable>{
private MultipleOutputs<IntWritable,IntWritable> multipleOutputs;
protected void setup(Context context) throws IOException, InterruptedException{
multipleOutputs = new MultipleOutputs<IntWritable, IntWritable>(context);
}
@覆盖 public void reduce(EdgeWritable key,Iterable val,Context context)抛出IOException,InterruptedException { int year = key.get(3).get(); int month = key.get(2).get(); int to = key.get(1).get(); int from = key.get(0).get();
//if(year >= 1997 && year <= 2001){
if((month >= 9 && year >= 1997) || (month <= 6 && year <= 2001)){
multipleOutputs.write(new IntWritable(from), new IntWritable(to), "out"+month+year );
}
//}
}
@Override
public void cleanup(Context context) throws IOException, InterruptedException{
multipleOutputs.close();
}
public class TimeSlicingDriver extends Configured implements Tool{
static final SimpleDateFormat sdf = new SimpleDateFormat("EEE, d MMM yyyy HH:mm:ss Z");
public int run(String[] args) throws Exception {
if(args.length != 2){
System.out.println("Enter <input path> <output path>");
System.exit(-1);
}
Configuration setup = new Configuration();
//setup.set("Input Path", args[0]);
Job job = new Job(setup, "Time Slicing");
//job.setJobName("Time Slicing");
job.setJarByClass(TimeSlicingDriver.class);
job.setMapperClass(TimeSlicingMapper.class);
job.setReducerClass(MultipleOutputReducer.class);
//MultipleOutputs.addNamedOutput(setup, "output", org.apache.hadoop.mapred.TextOutputFormat.class, EdgeWritable.class, NullWritable.class);
job.setMapOutputKeyClass(EdgeWritable.class);
job.setMapOutputValueClass(NullWritable.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(IntWritable.class);
/**Set the Input File Path and output file path*/
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
return job.waitForCompletion(true)?0:1;
}
答案 0 :(得分:0)
您没有迭代迭代器“val”,因此您的代码中的逻辑会针对每个组执行一次。