Question

我正在使用hadoop组合一个非常初始的编程任务，并使用经典的wordcount问题。

已将样本文件放在hdfs上，并尝试在其上运行wordcount。映射器运行得很好，但是，减速器停留在70％，永远不会前进。

我也尝试使用本地文件系统上的文件，并且行为相同。

我可能做错了什么？这里是map和reduce函数 -

public void map(LongWritable key, Text value,
        OutputCollector<Text, IntWritable> output, Reporter reporter)
        throws IOException {
    // TODO Auto-generated method stub
    String line = value.toString();

    String[] lineparts = line.split(",");

    for(int i=0; i<lineparts.length; ++i)
    {
        output.collect(new Text(lineparts[i]), new IntWritable(1));
    }


public void reduce(Text key, Iterator<IntWritable> values,
              OutputCollector<Text, IntWritable> output, Reporter reporter)
            throws IOException {
        // TODO Auto-generated method stub
        int count = 0;
        while(values.hasNext())
        {
            count=count+1;
        }
        output.collect(key , new IntWritable(count));
    }

Answer 1

你永远不会在你的迭代器上调用next()，所以你基本上是在创建一个无限循环。

作为旁注，实现此字数统计示例的首选方法是不将计数增加1，而是使用该值：

IntWritable value = values.next();
count += value.get();

这样，您可以将Reducer重新用作Combiner，以便计算每个映射器的部分计数并将（“wordX”，7）发送到reducer而不是7次出现（来自给定映射器的“wordX”，1）。您可以阅读有关合并器here的更多信息。

减速机停留在70％

1 个答案: