Question

我正在研究这个hadoop代码，但无法弄清楚它为什么不生成reducer输出，而是输出完全相同的mapper结果。我已经玩了很长时间的代码，测试不同的输出，但没有运气。

我的自定义映射器：

System.out.println(x);

我自定义的Reducer：

public static class UserMapper extends Mapper<Object, Text, Text, Text> {
    private final static IntWritable one = new IntWritable(1);
    private Text userid = new Text();
    private Text catid = new Text();

    /* map method */
    public void map(Object key, Text value, Context context)
                throws IOException, InterruptedException {
        StringTokenizer itr = new StringTokenizer(value.toString(), ","); /* separated by "," */
        int count = 0;

        userid.set(itr.nextToken());

        while (itr.hasMoreTokens()) {
            if (++count == 4) {
                // catid.set(itr.nextToken());
                catid.set("This is a test");
                context.write(userid, catid);
            }else {
                itr.nextToken();

            }
        }
    }
}

主程序的主体：

/* Reducer Class */
public static class UserReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable result = new IntWritable();

public void reduce(Text key, Iterable<Text> values, Context context)
            throws IOException, InterruptedException {
    int sum = 0;
    for (Text val : values) {
        sum += 1; //val.get();
    }
    result.set(0);
    context.write(key, result);
    }
}

输出文件：/ user / hduser / output / part-r-00000

Job job = new Job(conf, "User Popular Categories");
job.setJarByClass(popularCategories.class);
job.setMapperClass(UserMapper.class);
job.setCombinerClass(UserReducer.class);
job.setReducerClass(UserReducer.class);

job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setNumReduceTasks(2);

FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));

System.exit(job.waitForCompletion(true) ? 0 : 1);

Answer 1

仍然我很惊讶上面的代码块是如何为你工作的。就像你在Hadoop (java) change the type of Mapper output values中关于Mapper的另一个问题一样，你应该在这里得到例外。

似乎输出是Mapper而不是reducer。你确定文件名吗？

 /user/hduser/output/part-r-00000 

instead of  

 /user/hduser/output/part-m-00000

Mapper输出应为Reducer输入 。

public static class UserMapper extends Mapper<Object, Text, Text, Text> {

将输出键写为Text，输出值为Text。

您的Reducer被定义为

public static class UserReducer extends Reducer<Text, IntWritable, Text, IntWritable> {

表示输入键为文本（正确）but value is wrongly made as IntWritable ( It should be Text)

将声明更改为

public static class UserReducer extends Reducer<Text, Text, Text, IntWritable> {

并相应地在Driver程序中设置参数。

Hadoop返回mapper的输出而不是reducer

1 个答案: