我一直在尝试调试此错误一段时间。基本上,我已经确认我的reduce类正在将正确的输出写入其上下文,但由于某种原因,我总是得到一个零字节输出文件。
我的映射器类:
public class FrequencyMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
Document t = Jsoup.parse(value.toString());
String text = t.body().text();
String[] content = text.split(" ");
for (String s : content) {
context.write(new Text(s), new IntWritable(1));
}
}
}
我的减速机课程:
public class FrequencyReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int n = 0;
for (IntWritable i : values) {
n++;
}
if (n > 5) { // Do we need this check?
context.write(key, new IntWritable(n));
System.out.println("<" + key + ", " + n + ">");
}
}
}
和我的司机:
public class FrequencyMain {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration(true);
// setup the job
Job job = Job.getInstance(conf, "FrequencyCount");
job.setJarByClass(FrequencyMain.class);
job.setMapperClass(FrequencyMapper.class);
job.setCombinerClass(FrequencyReducer.class);
job.setReducerClass(FrequencyReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
由于某种原因,“减少输出记录”总是
Job complete: job_local805637130_0001
Counters: 17
Map-Reduce Framework
Spilled Records=250
Map output materialized bytes=1496
Reduce input records=125
Map input records=6
SPLIT_RAW_BYTES=1000
Map output bytes=57249
Reduce shuffle bytes=0
Reduce input groups=75
Combine output records=125
Reduce output records=0
Map output records=5400
Combine input records=5400
Total committed heap usage (bytes)=3606577152
File Input Format Counters
Bytes Read=509446
FileSystemCounters
FILE_BYTES_WRITTEN=385570
FILE_BYTES_READ=2909134
File Output Format Counters
Bytes Written=8
答案 0 :(得分:0)
(假设您的目标是打印频率> 5的单词频率)
组合器的当前实现完全打破了程序的语义。你需要删除它或重新实现:
n > 5
(但不在reducer中)。n
增加n++
。