MapReduce计算制表符分隔输入值的总和

时间:2016-12-07 05:47:07

标签: java hadoop mapreduce hdfs hadoop2

我正在尝试使用MapReduce来查找由其标签分隔的制表符分隔输入的总和。数据看起来像这样

1     5.0    4.0   6.0
2     2.0    1.0   3.0
1     3.0    4.0   8.0

第一列是类标签,所以我期待按类标签分类的输出。对于这个例子,输出将是

label 1: 30.0
label 2: 6.0

这是我尝试的代码,但输出错误

显示意外的类标签。

public class Total {

 public static class Map extends Mapper<LongWritable, Text, Text, DoubleWritable> {
    private final static DoubleWritable one = new DoubleWritable();
    private Text word = new Text();

    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String line = value.toString();
        StringTokenizer tokenizer = new StringTokenizer(line);
        word.set(tokenizer.nextToken());
        while (tokenizer.hasMoreTokens()) {
            one.set(Double.valueOf(tokenizer.nextToken()));
            context.write(word, one);                                           
        }
    }
 } 
 public static class Reduce extends Reducer<Text, DoubleWritable, Text, DoubleWritable> {
    private Text Msg = new Text();


    public void reduce(Text key, Iterable<DoubleWritable> values, Context context) 
      throws IOException, InterruptedException {
       firstMsg.set("label " + key+": Total");

       Double sum = 0.0;

         for (DoubleWritable val : values) {

            sum += val.get();


        }

        context.write(Msg, new DoubleWritable(sum));

    }
 }
//void method implementation also exists
}

1 个答案:

答案 0 :(得分:1)

您的目标是将所有相同的键放入自己的减速器中,以便您可以对数字求和。

所以,拿这个

1     5.0    4.0   6.0
2     2.0    1.0   3.0
1     3.0    4.0   8.0

基本上创建这个

1     [(5 .0    4.0   6.0), (3.0    4.0   8.0)]
2     [(2.0    1.0   3.0)]

因此,您的地图应仅输出键1和键2,每个键后面都有剩余值,每个键不一定有很多值。

为此,您可以使用Mapper<LongWritable, Text, Text, Text>。 (将输出数据类型更改为Text

public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
    String line = value.toString();

    StringTokenizer tokenizer = new StringTokenizer(line);
    word.set("label " + tokenizer.nextToken());

    StringBuilder remainder = new StringBuilder();
    while (tokenizer.hasMoreTokens()) {
        remainder.append(tokenizer.nextToken()).append(",");                                        
    }
    String output = remainder.setLength(remainder.getLength() - 1).toString()
    context.write(word, new Text(output));  
}

然后,在Reducer中,将它设为Reducer<Text, Text, Text, DoubleWritable>(读入(Text,Text)对),现在你有一个Iterable<Text> values,这是一个可逗号分隔的字符串的迭代,你可以解析为双打,并取累计金额。

你真的不需要减速器中的firstMsg.set件 - 可以在映射器中完成。