Question

最后一次减速的速度非常慢。另一个减少我的地图和减少的数量如下地图数量为18784，减少数量为1500 每次减少的平均时间约为1＆26;但最后减少的时间约为2小时我尝试改变减少的数量并减少工作的规模。但没有改变

public int getPartition(Object key, Object value, int numPartitions) {
    // TODO Auto-generated method stub
    String keyStr = key.toString();
    int partId= String.valueOf(keyStr.hashCode()).hashCode();
    partId = Math.abs(partId % numPartitions);
    partId = Math.max(partId, 0);
    return partId;
    //return (key.hashCode() & Integer.MAX_VALUE) % numPartitions;
}

Answer 1

很可能您遇到了数据偏差问题。

或者您的密钥分发不是很好，或者您的getPartition正在产生问题。我不明白为什么你要从字符串的哈希码创建一个字符串，然后获取这个新字符串的哈希码。我的建议是首先尝试使用默认分区，然后查看密钥的分布。

Answer 2

实际上，当您处理大量数据时，应该设置Combiner类。如果你想改变编码，你应该重置Reduce功能。例如。

 public class GramModelReducer extends Reducer<Text, LongWritable, Text, LongWritable> {

private LongWritable result = new LongWritable();
public void reduce(Text key, Iterable<LongWritable> values,Context context) throws IOException, InterruptedException {

      long sum = 0;
      for (LongWritable val : values) {
        sum += val.get();
      }
      result.set(sum);
      context.write(new Text(key.toString().getBytes("GB18030")), result);
}

}

class GramModelCombiner extends Reducer<Text, LongWritable, Text, LongWritable> {
public void reduce(Text key, Iterable<LongWritable> values,Context context) throws IOException, InterruptedException {

      long sum = 0;
      for (LongWritable val : values) {
        sum += val.get();
      }
      context.write(key, new LongWritable(sum));
}

}

MapReduce中的最后一个reducer非常慢

2 个答案: