Question

我是Hadoop的新手。我想运行一个MapReduce示例，并使用计算器映射器查看其结果。也就是说，我想知道，每个中间结果都是由哪个映射器计算的？可能吗？怎么样？

我安装了Hadoop 2.9.0（多节点集群）。

Answer 1

首先我们看一下示例代码（我已经安装了HDP集群，因此.jar文件的路径可能不同）

示例文本文件作为输入：

$ bin / hadoop dfs -ls / wordcount / input /

/ wordcount / input / file01

/ wordcount / input / file02

$ bin / hadoop dfs -cat / wordcount / input / file01

Hello World Bye World

$ bin / hadoop dfs -cat / wordcount / input / file02

Hello Hadoop Goodbye Hadoop

运行应用程序：

$ bin / hadoop jar /usr/hdp/2.6x.x/hadoop-mapreduce/hadoo-mapreduce-examples.jar wordcount / wordcount / input / wordcount / output

注意：您不需要编写字数统计程序，它在mapreduce文件夹中默认给出，正如我所提到的。下面给出的代码仅供参考

<强>输出：

$ bin / hadoop dfs -cat / wordcount / output / part-00000

再见1

Hadoop 2

你好2

世界2

现在，让我们看看mapper和reducer如何在后端工作：

WordCount应用程序非常简单。

Mapper 实施（第14-26行），通过地图方法（第18-25行），处理一行时间，由指定的 TextInputFormat（第49行）提供。然后，它通过StringTokenizer将行拆分为由空格分隔的标记，并发出键值对＆lt; ，1＆gt;。

对于给定的样本输入，第一张地图会发出：

＆LT;您好，1＆gt;

＆LT;世界，1＆gt;

＆LT;再见，1>

＆LT;世界，1＆gt;

第二张地图会发出：

＆LT;您好，1＆gt;

＆LT; Hadoop，1＆gt;

＆LT;再见，1＆gt;

＆LT; Hadoop，1＆gt;

我们将在本教程的稍后部分详细了解为给定作业生成的地图数量，以及如何以细粒度方式控制它们。

WordCount还指定合并器（第46行）。因此，在按键排序后，每个映射的输出都通过本地组合器（与作业配置的Reducer相同）进行本地聚合。

第一张地图的输出：

＆LT;再见，1>

＆LT;您好，1＆gt;

＆LT;世界，2＆gt;

第二张地图的输出：

＆LT;再见，1＆gt;

＆LT; Hadoop，2＆gt;

＆LT;您好，1＆gt;

Reducer 实现（第28-36行），通过reduce方法（第29-35行）只是总结了值，这些值是每个值的出现次数key（即本例中的单词）。

因此作业输出：

＆LT;再见，1>

＆LT;再见，1＆gt;

＆LT; Hadoop，2＆gt;

＆LT;您好，2＆gt;

＆LT;世界，2＆gt;

run方法在JobConf中指定作业的各个方面，例如输入/输出路径（通过命令行传递），键/值类型，输入/输出格式等。然后调用 JobClient.runJob（第55行）来提交并监控其进度。

现在，这里提到的字数计划是：

1.  package org.myorg;

2.  

3.  import java.io.IOException;

4.  import java.util.*;


5.  

6.  import org.apache.hadoop.fs.Path;

7.  import org.apache.hadoop.conf.*;

8.  import org.apache.hadoop.io.*;

9.  import org.apache.hadoop.mapred.*;

10. import org.apache.hadoop.util.*;

11. 

12. public class WordCount {

13. 

14.    public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {

15.      private final static IntWritable one = new IntWritable(1);

16.      private Text word = new Text();

17. 

18.      public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {

19.        String line = value.toString();

20.        StringTokenizer tokenizer = new StringTokenizer(line);

21.        while (tokenizer.hasMoreTokens()) {

22.          word.set(tokenizer.nextToken());

23.          output.collect(word, one);

24.        }

25.      }

26.    }

27. 

28.    public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {

29.      public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {


30.        int sum = 0;

31.        while (values.hasNext()) {

32.          sum += values.next().get();

33.        }

34.        output.collect(key, new IntWritable(sum));

35.      }

36.    }

37. 

38.    public static void main(String[] args) throws Exception {

39.      JobConf conf = new JobConf(WordCount.class);

40.      conf.setJobName("wordcount");

44. 

45.      conf.setMapperClass(Map.class);

46.      conf.setCombinerClass(Reduce.class);

47.      conf.setReducerClass(Reduce.class);

48. 

49.      conf.setInputFormat(TextInputFormat.class);

50.      conf.setOutputFormat(TextOutputFormat.class);

51. 

52.      FileInputFormat.setInputPaths(conf, new Path(args[0]));

53.      FileOutputFormat.setOutputPath(conf, new Path(args[1]));

54. 

55.      JobClient.runJob(conf);

57.    }

58. }

59.

参考：MapReduce Tutorial

Hadoop：哪个映射器返回了哪个结果？

1 个答案: