Question

我正在尝试实施一个统计公式，需要将数据点与所有其他可能的数据点进行比较。例如，我的数据集类似于：

我需要通过这个文件：

for (i=0;i< data.length();i++)
   for (j=0;j< data.length();j++)
     Sum +=(data[i] + data[j])

基本上当我通过我的map函数得到每一行时，我需要在reducer中对其余文件执行一些指令，就像在嵌套for循环中一样。现在我尝试使用distributedCache，某种形式的ChainMapper但无济于事。任何关于我如何做这件事的想法都会非常感激。即使是开箱即用的方式也会有所帮助。

Answer 1

您需要覆盖Reducer类的run方法实现。

 public void run(Context context) throws IOException, InterruptedException {
  setup(context);
  while (context.nextKey()) {
     //This corresponds to the ones corresponding to i of first iterator
    Text currentKey = context.getCurrentKey();
    Iterator<VALUEIN> currentValue = context.getValues();
    if(context.nextKey()){
     //You can get the Next Values the ones corresponding to j of you second iterator
    }
}
cleanup(context);

}

或者如果你没有减速器，你也可以在Mapper中通过覆盖

来做同样的事情。

public void run(Context context) throws IOException, InterruptedException {
setup(context);
while (context.nextKeyValue()) {
 /*context.nextKeyValue() if invoked again gives you the next key values which is same as the ones you are looking for in the second loop*/
}
cleanup(context);

}

如果有帮助，请告诉我。

Hadoop：在MapReduce中实现嵌套for循环[Java]

1 个答案: