Question

在我的reduce方法中，我希望使用TreeMap变量reduceMap进行操作来聚合传入的键值。但是，此映射在每次reduce方法调用时都会丢失它的状态。随后Hadoop只打印放入TreeMap的最后一个值（加上我添加的测试值）。这是为什么？它确实有效，因为我打算使用map方法。

public static class TopReducer extends
            Reducer<Text, IntWritable, Text, IntWritable> {

        private TreeMap<Text, Integer> reducedMap = new TreeMap<Text, Integer>();

        @Override
        public void reduce(Text key, Iterable<IntWritable> values,
                Context context) throws IOException, InterruptedException {

            int sum = 0;
            String strValues = "";
            for (IntWritable value : values) {
                sum += value.get();
                strValues += value.get() + ", ";
            }
            System.out.println("Map size Before: " +reducedMap);
            Integer val = sum;
            if (reducedMap.containsKey(key))
                val += reducedMap.get(key);
            // Only add, if value is of top 30.
            reducedMap.put(key, val);
            System.out.println("Map size After: " +reducedMap);
            reducedMap.put(new Text("test"), 77777);

            System.out.println("REDUCER: rcv: (" + key + "), " + "(" + sum
                    + "), (" + strValues + "):: new (" + val + ")");
        }

        /**
         * Flush top 30 context to the next phase.
         */
        @Override
        protected void cleanup(Context context) throws IOException,
                InterruptedException {
            System.out.println("-----FLUSHING TOP " + TOP_N
                    + " MAPPING RESULTS-------");
            System.out.println("MapSize: " + reducedMap);
            int i = 0;
            for (Entry<Text, Integer> entry : entriesSortedByValues(reducedMap)) {
                System.out.println("key " + entry.getKey() + ", value "
                        + entry.getValue());
                context.write(entry.getKey(), new IntWritable(entry.getValue()));

                if (i >= TOP_N)
                    break;
                else
                    i++;
            }
        }
    }

Answer 1

Hadoop为了提高效率而重新使用对象引用 - 所以当你调用reducedMap.put(key, val)时，键值将匹配映射中已有的键（因为Hadoop刚刚替换了键对象的内容，而不是给你一个对具有新内容的新对象的新引用）。它实际上与调用以下内容相同：

Text key = new Text("x");
reducedMap.put(key, val); // map will be of size 1
key.set("y");
reducedMap.put(key, val); // map will still be of size 1
                          // as it will be comparing key to the itself
                          // and just updating the mapped value val

在将密钥放入地图之前，您需要对密钥进行深层复制：

reducedMap.put(new Text(key), val)

为什么在每个reduce方法之后TreeMap都会重置？

1 个答案: