我定义了一个名为EquivalenceClsAggValue
的类,它有一个数组的数据字段(称为aggValues
)。
class public class EquivalenceClsAggValue extends Configured implements WritableComparable<EquivalenceClsAggValue>{
public ArrayList<SortedMapWritable> aggValues;
它有一个方法,它接受另一个EquivalenceClsAggValue
类型的对象,并将其aggValues
合并到此类的aggValues
中,如下所示:
public void addEquivalenceCls(EquivalenceClsAggValue eq){
//comment: eq contains only one entry as it comes from the mapper
if (this.aggValues.size()==0){ //new line
this.aggValues = eq.aggValues;
return;
}
for(int i=0;i<eq.aggValues.size();i++){
SortedMapWritable cm = aggValues.get(i); //cm: current map
SortedMapWritable nm = eq.aggValues.get(i); //nm: new map
Text nk = (Text) nm.firstKey();//nk: new key
if(cm.containsKey(nk)){//increment the value
IntWritable ovTmp = (IntWritable) cm.get(nk);
int ov = ovTmp.get();
cm.remove(nk);
cm.put(nk, new IntWritable(ov+1));
}
else{//add new entry
cm.put(nk, new IntWritable(1));
}
}
}
但是这个函数没有合并两个aggValues
。有人可以帮我搞清楚吗?
这就是我称之为这种方法的方法:
public void reduce(IntWritable keyin,Iterator<EquivalenceClsAggValue> valuein,OutputCollector<IntWritable, EquivalenceClsAggValue> output,Reporter arg3) throws IOException {
EquivalenceClsAggValue comOutput = valuein.next();//initialize the output with the first input
while(valuein.hasNext()){
EquivalenceClsAggValue e = valuein.next();
comOutput.addEquivalenceCls(e);
}
output.collect(keyin, comOutput);
}
答案 0 :(得分:1)
看起来你正在重复使用对象。 Hadoop重新使用相同的对象,因此每次调用valuein.next()
实际上都返回相同的对象引用,但该对象的内容通过readFields方法重新初始化。
尝试更改如下(创建要聚合的新实例):
EquivalenceClsAggValue comOutput = new EquivalenceClsAggValue();
while(valuein.hasNext()){
EquivalenceClsAggValue e = valuein.next();
comOutput.addEquivalenceCls(e);
}
output.collect(keyin, comOutput);
编辑:您可能还需要更新聚合方法(警惕对象重复使用):
public void addEquivalenceCls(EquivalenceClsAggValue eq){
//comment: eq contains only one entry as it comes from the mapper
for(int i=0;i<eq.aggValues.size();i++){
SortedMapWritable cm = aggValues.get(i); //cm: current map
SortedMapWritable nm = eq.aggValues.get(i); //nm: new map
Text nk = (Text) nm.firstKey();//nk: new key
if(cm.containsKey(nk)){//increment the value
// you don't need to remove and re-add, just update the IntWritable
IntWritable ovTmp = (IntWritable) cm.get(nk);
ovTmp.set(ovTmp.get() + 1);
}
else{//add new entry
// be sure to create a copy of nk when you add in to the map
cm.put(new Text(nk), new IntWritable(1));
}
}
}