Question

我最近开始在hadoop工作，我刚刚学到了一些基本的理论知识。我正在尝试解决一个任务，其中输入应在文本文件中给出，例如input.txt（1 10 37 5 4 98 100等）

我需要找到给定输入中的最大整数（即整数类型）。我试图传递arraylist中的输入，以便我可以将第一个整数与所有整数的其余部分进行比较（使用for-loop）。

1）是否有可能以这种方式找到解决方案？如果是的话，我无法在hadoop中创建一个arraylist并需要一些提示:-)

2）我们可以只打印'键'而不是键值对吗？如果是这样，请帮助我。我试着在reduce函数中编写代码而不打印它，但是我遇到了一些错误。

请指导我一些可以向前推进的提示。谢谢

Answer 1

在地图步骤中，您可以将所有数字映射到单个键。然后在缩小步骤中，您可以采取最大值。 reduce步骤将传递给定键的可迭代值集合 - 无需创建自己的ArrayList。

Answer 2

为此你最好有一个减速器。

为了确保所有数字都达到相同的reducer，你必须做两件事：

发送映射器中所有输入值的相同键
将减少任务设置为零。

您map()方法可能如下所示：

@Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
          context.write(new Text("MyAwesomeKey"), key); // assuming that your number is being read in the key
           }

在Reduce课程中，拥有属性max，例如： Long max

reduce()方法可能如下所示：

@Override
public void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException {
          context.write(new Text("MyAwesomeKey"), key); // assuming that your number is being read in the key
           }

然后覆盖run()，因为我们覆盖reduce()：

 public void run(Context context) throws IOException, InterruptedException {
    setup(context);
    while (context.nextKey()) {
      reduce(context.getCurrentKey(), context.getValues(), context);
    }
    context.write(new LongWritable(max),new Text("")); // write the max value
    cleanup(context);
  }

要将减少任务设置为1，请在作业run()中执行以下操作，请注意这与上述run()不同：

job.setNumReduceTasks(1);

注意：以上所有代码都遵循新的 mapreduce API，我相信使用旧的 mapred API我们将无法拥有在reducer完成它之后的单点钩子，我们可以通过覆盖Reducer的run()来完成。

使用（java编程）在hadoop中查找最大整数值

2 个答案: