Question

我正在使用hadoop开发mapreduce程序我在reducer中有这部分代码：

public void reduce(Text key, Iterable<TextLongWritable> values,Context context) throws IOException, InterruptedException {

    long word1count = 0;
    List<TextLongWritable> cache = new ArrayList<TextLongWritable>();

    String decade = key.toString().split("\t")[0];
    String word1 = key.toString().split("\t")[1];

    for (TextLongWritable val : values) {
        if (val.getWord().equals("*")){
            word1count += val.getCount();
            continue;
        }
        cache.add(val);
        log.info("***Reducer*** Word1: " + word1 + "  Word2: " + val.getWord());
    }

    context.write(key, new Text("" + word1count));

    for (TextLongWritable value : cache) {
        if (value.getWord().equals("*")){
            continue;
        }
        log.info("***Reducer*** Word1: " + word1 + "  Word2: " + value.getWord());
        context.write(new Text(decade + "\t" + value.getWord()), new Text(word1 + " " + value.getCount() + "\t" + word1count));
    }

}

首先，我在使用缓存时看到here，以便对值进行两次迭代。

我的问题是在第二个循环中，所有值都保持不变。例如，如果我的列表中包含one two three字样。可以说密钥是1900 test，因此word1 = "test"。

第一个记录器输出将是：

***Reducer*** Word1: test  Word2: one
***Reducer*** Word1: test  Word2: two
***Reducer*** Word1: test  Word2: three

但第二个记录器输出将为：

***Reducer*** Word1: test  Word2: one
***Reducer*** Word1: test  Word2: one
***Reducer*** Word1: test  Word2: one

由于某种原因，价值保持不变我在这做错了什么？它与hadoop有关吗？

Answer 1

Hadoop caches the same object during deserialization due to GC overhead. You have to clone or deep copy your TextLongWritable in order to put it into a collection.

Answer 2

I managed to solve this by referring to this page. I actually first went over all these cases, where this case is the second wrong example in that page.

Some explanation on what is happening here in managing iterator in mapreduce post.

So what I had to do is to make a deep copy of my value before adding it to cache.

For completion here is my working code:

public void reduce(Text key, Iterable<TextLongWritable> values,Context context) throws IOException, InterruptedException {

    long word1count = 0;
    List<TextLongWritable> cache = new ArrayList<TextLongWritable>();

    String decade = key.toString().split("\t")[0];
    String word1 = key.toString().split("\t")[1];

    for (TextLongWritable val : values) {
        if (val.getWord().equals("*")){
            word1count += val.getCount();
            continue;
        }
        TextLongWritable val_copy = new TextLongWritable(val.getWord(),val.getCount());
        cache.add(val_copy);
    }

    context.write(key, new Text("" + word1count));

    for (TextLongWritable value : cache) {
        context.write(new Text(decade + "\t" + value.getWord()), new Text(word1 + " " + value.getCount() + "\t" + word1count));
    }
}

第二次迭代 - 值保持不变

2 个答案: