在我的reduce
方法中,我希望使用TreeMap
变量reduceMap
进行操作来聚合传入的键值。但是,此映射在每次reduce
方法调用时都会丢失它的状态。随后Hadoop只打印放入TreeMap的最后一个值(加上我添加的测试值)。这是为什么?它确实有效,因为我打算使用map
方法。
public static class TopReducer extends
Reducer<Text, IntWritable, Text, IntWritable> {
private TreeMap<Text, Integer> reducedMap = new TreeMap<Text, Integer>();
@Override
public void reduce(Text key, Iterable<IntWritable> values,
Context context) throws IOException, InterruptedException {
int sum = 0;
String strValues = "";
for (IntWritable value : values) {
sum += value.get();
strValues += value.get() + ", ";
}
System.out.println("Map size Before: " +reducedMap);
Integer val = sum;
if (reducedMap.containsKey(key))
val += reducedMap.get(key);
// Only add, if value is of top 30.
reducedMap.put(key, val);
System.out.println("Map size After: " +reducedMap);
reducedMap.put(new Text("test"), 77777);
System.out.println("REDUCER: rcv: (" + key + "), " + "(" + sum
+ "), (" + strValues + "):: new (" + val + ")");
}
/**
* Flush top 30 context to the next phase.
*/
@Override
protected void cleanup(Context context) throws IOException,
InterruptedException {
System.out.println("-----FLUSHING TOP " + TOP_N
+ " MAPPING RESULTS-------");
System.out.println("MapSize: " + reducedMap);
int i = 0;
for (Entry<Text, Integer> entry : entriesSortedByValues(reducedMap)) {
System.out.println("key " + entry.getKey() + ", value "
+ entry.getValue());
context.write(entry.getKey(), new IntWritable(entry.getValue()));
if (i >= TOP_N)
break;
else
i++;
}
}
}
答案 0 :(得分:3)
Hadoop为了提高效率而重新使用对象引用 - 所以当你调用reducedMap.put(key, val)
时,键值将匹配映射中已有的键(因为Hadoop刚刚替换了键对象的内容,而不是给你一个对具有新内容的新对象的新引用)。它实际上与调用以下内容相同:
Text key = new Text("x");
reducedMap.put(key, val); // map will be of size 1
key.set("y");
reducedMap.put(key, val); // map will still be of size 1
// as it will be comparing key to the itself
// and just updating the mapped value val
在将密钥放入地图之前,您需要对密钥进行深层复制:
reducedMap.put(new Text(key), val)