Question

我运行了一个简单的wordcount MapReduce示例，在合并器输出中添加了一个小变化的合并器，合并器的输出没有被reducer合并。方案如下

测试：地图 - ＆gt;组合器 - ＆gt;减速器

在组合器中，我添加了两个额外的行来输出一个不同的单词并计数1，reducer不是将＆＃34;不同的＆＃34;字数。输出贴在下面。

文字t =新文字（＆＃34;不同＆＃34;）; //添加了我自己的输出

context.write（t，new IntWritable（1））; //添加了我自己的输出

public class wordcountcombiner extends Reducer<Text, IntWritable, Text, IntWritable>{

  @Override
  public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException
  {
    int sum = 0;
    for (IntWritable val : values)
    {
        sum += val.get();
    }
    context.write(key, new IntWritable(sum));
    Text t = new Text("different"); // Added my own output
    context.write(t, new IntWritable(1)); // Added my own output
  }
}

输入：

我运行了一个简单的wordcount MapReduce示例，在合并器输出中添加了一个小变化的合并器，合并器的输出没有被reducer合并。方案如下在组合器中，我添加了两个额外的行来输出一个不同的单词并计数1，reducer不是将＆＃34;不同的＆＃34;字数。输出贴在下面。

输出：

"different" 1
different   1
different   1
I           2
different   1
In          1
different   1
MapReduce   1
different   1
The         1
different   1
...

这怎么可能发生？

fullcode：我用计算器运行wordcount程序，只是为了好玩，我在组合器中调整它，所以我遇到了这个问题。我有三个单独的类用于mapper，combiner和reducer。

驱动：

public class WordCount {

  public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
    // TODO Auto-generated method stub

    Job job = Job.getInstance(new Configuration());
    job.setJarByClass(wordcountmapper.class);
    job.setJobName("Word Count");

    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));

    job.setMapperClass(wordcountmapper.class);
    job.setCombinerClass(wordcountcombiner.class);
    job.setReducerClass(wordcountreducer.class);
    job.getConfiguration().set("fs.file.impl", "com.conga.services.hadoop.patch.HADOOP_7682.WinLocalFileSystem");       

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);

    System.exit(job.waitForCompletion(true)? 0 : 1);

  }

}

映射器：

public class wordcountmapper extends Mapper<LongWritable, Text, Text, IntWritable> {

  private Text word = new Text();
  IntWritable one = new IntWritable(1);
  @Override
  public void map(LongWritable key, Text value, Context context) 
        throws IOException, InterruptedException 
  {
    String line = value.toString();
    StringTokenizer token = new StringTokenizer(line);
    while (token.hasMoreTokens())
    {
        word.set(token.nextToken());
        context.write(word, one);
    }
  }
}

合

public class wordcountcombiner extends Reducer<Text, IntWritable, Text, IntWritable>{

  @Override
  public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException
  {
    int sum = 0;
    for (IntWritable val : values)
    {
        sum += val.get();
    }
    context.write(key, new IntWritable(sum));
    Text t = new Text("different");
    context.write(t, new IntWritable(1));
  }
}

减速机：

public class wordcountreducer extends Reducer<Text, IntWritable, Text, IntWritable>{

  @Override
  public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException
  {
    int sum = 0;
    for (IntWritable val : values)
    {
        sum += val.get();
    }
    context.write(key, new IntWritable(sum));
  }
}

Answer 1

输出正常，因为你有两行做错了事：你为什么要这个代码

Text t = new Text("different"); // Added my own output
context.write(t, new IntWritable(1)); // Added my own output

在你的减速器中，你正在做总和，然后你将输出加到不同的1 ....

Answer 2

您正在"1 different"函数中的作业的最终输出中写入新的reduce，而不进行任何类型的聚合。每个键调用一次reduce函数，正如您在方法签名中看到的那样，它将一个键作为参数和该键的值列表，这意味着它为每个键调用一次。

由于您使用的是一个单词的密钥，并且在reduce的每个调用中，您正在写入输出"1 different"，您将获得输入数据中每个单词的其中一个。

Answer 3

hadoop要求组合器中的reduce方法只写入它作为输入接收的相同密钥。这是必需的，因为hadoop仅在调用组合器之前对键进行排序，它在组合器运行后不会对它们进行重新排序。在你的程序中，reduce方法会写入密钥＆＃34;不同的＆＃34;除了作为输入收到的密钥。这意味着关键＆＃34;不同＆＃34;然后按键的顺序出现在不同的位置，这些事件在传递给reducer之前不会合并。

例如：

假设映射器输出的键的排序列表是："alpha", "beta", "gamma"

然后您的组合器被调用三次（一次用于"alpha"，一次用于"beta"，一次用于"gamma"）并生成密钥"alpha", "different"，然后键"beta", "different" }，然后键"gamma", "different"。

组合器执行后的"sorted"（但实际上没有排序）键列表是：

"alpha", "different", "beta", "different", "gamma", "different"

此列表不会再次排序，因此不同的＆＃34;不同的＆＃34;不要合并。

然后将减速器分别调用六次，然后按键＆＃34;不同的＆＃34;在减速器的输出中出现3次。

为什么Hadoop组合器输出没有被reducer合并

输入：

输出：

3 个答案: