Question

我有一个映射器类，它产生几十行。然后通过mapreduce内部框架对此输出进行排序和合并。在这个排序之后，我想只通过reducer输出前5个记录。我怎样才能做到这一点？我维护了一个count变量，它在reduce方法中递增。但这不起作用，它在输出中提供所有记录。我认为这是因为reduce类被调用每个输入行到reducer，所以count每次都初始化为0。有没有办法维护全局变量？

公共类Reduce2扩展了Reducer {

A Car
A House

}

Answer 1

Reducer的run（）方法执行一次，它为每个键调用reduce（）方法。下面是Reducer的run（）方法的默认代码。

public void run(Context context) throws IOException, InterruptedException {
    setup(context);
    try {
      while (context.nextKey()) {
        reduce(context.getCurrentKey(), context.getValues(), context);
        // If a back up store is used, reset it
        Iterator<VALUEIN> iter = context.getValues().iterator();
        if(iter instanceof ReduceContext.ValueIterator) {
          ((ReduceContext.ValueIterator<VALUEIN>)iter).resetBackupStore();        
        }
      }
    } finally {
      cleanup(context);
    }
  }

因此，如果在reduce（）方法中定义count变量，则每次都会初始化（对于每个键）。而是在reducer实现中覆盖Reducer的run（）方法，并将count变量移动到此run（）方法。

  public void run(Context context) throws IOException, InterruptedException {
        setup(context);
        int count=0;
        try {
          while (context.nextKey() && count<5) {
              count++;
            reduce(context.getCurrentKey(), context.getValues(), context);
            // If a back up store is used, reset it
            Iterator<Text> iter = context.getValues().iterator();
            if(iter instanceof ReduceContext.ValueIterator) {
              ((ReduceContext.ValueIterator<Text>)iter).resetBackupStore();        
            }
          }
        } finally {
          cleanup(context);
        }
}

这应该有用。

限制减速器的输出

1 个答案: