Map减少输出不正确

时间:2016-06-10 09:35:29

标签: hadoop mapreduce

我有一个输入文件

UserId|TrackId|Shared|Radio|Skip
111115|222|0|1|0
111113|225|1|0|0
111117|223|0|1|1
111115|225|1|0|0 

我需要为所有曲目ID添加共享和广播列 输出应为

222,1
223,1
225,2

通过我编写的以下程序,我得到了

222,1
223,1
225,1
225,2.

不确定错误是什么

这是我的程序

public class Total {

public static class ListenMap extends Mapper<LongWritable, Text, Text, IntWritable>
{
    public void map(LongWritable key, Text values, Context context) throws IOException, InterruptedException
    {
        String slt= values.toString();
        String arr[]= slt.split("[|]");
        String trackid= arr[1];
        String shared= arr[2];
        String radio= arr[3];
        int sharenum= Integer.parseInt(shared);
        int radionum= Integer.parseInt(radio);
        int total= sharenum+radionum;
        context.write(new Text(trackid), new IntWritable(total));
    }
}


public static class ListenReduce extends Reducer<Text, IntWritable, Text, IntWritable>
{
    public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException
    {
        int sum=0;
        for(IntWritable x: values)
        {
            sum+=x.get();
            context.write(key, new IntWritable(sum));

        }
    }
}
public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException{
    Configuration conf= new Configuration();
    Job job= new Job(conf, "listen");

    job.setJarByClass(Total.class);
    job.setMapperClass(ListenMap.class);
    job.setReducerClass(ListenReduce.class);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);

    job.setInputFormatClass(TextInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);

    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));

    System.exit(job.waitForCompletion(true)? 1:0);

  }
}

3 个答案:

答案 0 :(得分:1)

你在for循环中写出结果。将它移到外面:

public static class ListenReduce extends Reducer<Text, IntWritable, Text, IntWritable>
{
    public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException
    {
        int sum=0;
        for(IntWritable x: values)
        {
            sum+=x.get();
        }
        context.write(key, new IntWritable(sum));
    }
}

答案 1 :(得分:1)

context.write(key, new IntWritable(sum));移到循环外,除非您想在增加它之后打印每个sum值。

我会假设这段时间在提问时是一个错字,因为你的代码没有添加。

答案 2 :(得分:1)

您正在for循环中编写上下文对象,这就是您可以看到重复键的原因。

相反,每个键只应写一次。

public static class ListenReduce extends Reducer<Text, IntWritable, Text, IntWritable>
{
    public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException
    {
        int sum=0;
        for(IntWritable x: values)
        {
            sum+=x.get();
        }
        // Write it here
        context.write(key, new IntWritable(sum));
    }
}