Question

我很难使用Hadoop map reduce来计算两个值之间的总和。

例如，我想计算[1, 15000]的总和。但据我所知，map-reduce处理的是具有共同点（标签）的数据。

我设法了解该数据的架构：

doctor  23
doodle  34
doctor  2
doodle  5

这些是在给定文本中出现单词find。

使用map reduce会链接给定单词的值，如下所示：

doctor [(23 2)]
doodle [(34 5)]

然后计算这些值的总和。

但就一个总和来说，我们从来没有像上面例子中的绳子那样有共同点。鉴于数据集：

DS1: 1 2 3 4 5 ..... 15000

是否可以使用map reduce架构计算列表中所有totient的总和？

Answer 1

如果您在文本文件中有数字，用空格分隔，您可以将它们拆分并在映射器中求和，如下所示：

映射器：

public class SumMapper extends Mapper<LongWritable, Text, NullWritable, IntWritable> {
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        int sum = Arrays.stream(value.toString().split(" ")).mapToInt(Integer::valueOf).sum();
        context.write(NullWritable.get(), new IntWritable(sum));
    }
}

工作控制：

public class LocalMapReduceRunner {

    public static void main(String[] args) throws Exception {
        Runtime.getRuntime().exec("rm -rf " + args[1]);

        Job job = Job.getInstance(new Configuration());

        job.setJobName("MR_runner");
        job.setJarByClass(LocalMapReduceRunner.class);

        job.setMapperClass(SumMapper.class);
        job.setMapOutputKeyClass(NullWritable.class);
        job.setOutputValueClass(IntWritable.class);

        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

感谢@ cricket_007提出建议。

Hadoop映射减少了总和

1 个答案: