我在mapreduce中尝试使用wordcount的java代码,在完成reduce方法后,我想显示最多次出现的唯一单词。
为此我创建了一些名为myoutput,mykey和completeSum的类级变量。
我用close方法编写这些数据,但最后我得到了意想不到的结果。
public class WordCount {
public static class Map extends MapReduceBase implements
Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value,
OutputCollector<Text, IntWritable> output, Reporter reporter)
throws IOException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
output.collect(word, one);
}
}
}
static int completeSum = -1;
static OutputCollector<Text, IntWritable> myoutput;
static Text mykey = new Text();
public static class Reduce extends MapReduceBase implements
Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> values,
OutputCollector<Text, IntWritable> output, Reporter reporter)
throws IOException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
if (completeSum < sum) {
completeSum = sum;
myoutput = output;
mykey = key;
}
}
@Override
public void close() throws IOException {
// TODO Auto-generated method stub
super.close();
myoutput.collect(mykey, new IntWritable(completeSum));
}
}
public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(WordCount.class);
conf.setJobName("wordcount");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(Map.class);
// conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);
}
}
one
three three three
four four four four
six six six six six six six six six six six six six six six six six six
five five five five five
seven seven seven seven seven seven seven seven seven seven seven seven seven
six 18
three 18
结果我可以看到总和是正确的,但关键不是。
答案 0 :(得分:1)
您正在观察的问题是由于引用别名。 key
引用的对象将重新使用多个调用的新内容,从而更改引用同一对象的mykey
。它以最后一个减少的键结束。复制对象可以避免这种情况,如:
mykey = new Text(key);
但是,您应该仅从输出文件获取结果,因为static
变量不能由分布式群集中的不同节点共享。它只适用于独立模式,无法实现map-reduce的目的。
最后,即使在独立模式下使用全局变量,如果使用并行本地任务,也会大多数情况下导致竞争(参见MAPREDUCE-1367和MAPREDUCE-434)。