我正在研究这个hadoop代码,但无法弄清楚它为什么不生成reducer输出,而是输出完全相同的mapper结果。我已经玩了很长时间的代码,测试不同的输出,但没有运气。
我的自定义映射器:
System.out.println(x);
我自定义的Reducer:
public static class UserMapper extends Mapper<Object, Text, Text, Text> {
private final static IntWritable one = new IntWritable(1);
private Text userid = new Text();
private Text catid = new Text();
/* map method */
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString(), ","); /* separated by "," */
int count = 0;
userid.set(itr.nextToken());
while (itr.hasMoreTokens()) {
if (++count == 4) {
// catid.set(itr.nextToken());
catid.set("This is a test");
context.write(userid, catid);
}else {
itr.nextToken();
}
}
}
}
主程序的主体:
/* Reducer Class */
public static class UserReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (Text val : values) {
sum += 1; //val.get();
}
result.set(0);
context.write(key, result);
}
}
输出文件:/ user / hduser / output / part-r-00000
Job job = new Job(conf, "User Popular Categories");
job.setJarByClass(popularCategories.class);
job.setMapperClass(UserMapper.class);
job.setCombinerClass(UserReducer.class);
job.setReducerClass(UserReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setNumReduceTasks(2);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
答案 0 :(得分:0)
仍然我很惊讶上面的代码块是如何为你工作的。就像你在Hadoop (java) change the type of Mapper output values中关于Mapper的另一个问题一样,你应该在这里得到例外。
似乎输出是Mapper而不是reducer。你确定文件名吗?
/user/hduser/output/part-r-00000
instead of
/user/hduser/output/part-m-00000
Mapper输出应为Reducer输入 。
public static class UserMapper extends Mapper<Object, Text, Text, Text> {
将输出键写为Text
,输出值为Text
。
您的Reducer
被定义为
public static class UserReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
表示输入键为文本(正确)but value is wrongly made as IntWritable ( It should be Text)
将声明更改为
public static class UserReducer extends Reducer<Text, Text, Text, IntWritable> {
并相应地在Driver
程序中设置参数。