对于所有这些hadoop的东西真的很新,我所拥有的只是Wordcount.java 归档并运行。 我只是想知道如何输出前100名 文件中最常见的单词。
我尝试过使用Treemap,但我还是不明白。 有人可以给我一个解决方案并解释它 请和我一起来。
public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
private TreeMap<Integer, Text> words = new TreeMap<Integer, Text>();
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
String v[] = value.toString().split("\t");
Int count = Integer.parseInt(v[1]);
words.put(count, value);
if (words.size() > 100) {
words.remove(words.firstKey());
}
protected void cleanup(Context context)
throws IOException, InterruptedException {
for ( Text wrd : words.values() ) {
context.write(NullWritable.get(), wrd);
}
}
}
}
public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
TreeMap<Integer, Text> words = new TreeMap< Integer, Text>();
for (Text value : values) {
String v[] = value.toString().split("\t");
Int count = Integer.parseInt(v[1]);
words.put(count, value);
if (words.size() > 100) {
 words.remove(words.firstKey());
}
}
for (Text t : words.values()) {
context.write(NullWritable.get(), t);
}
}
}
我甚至不知道这个代码是否正确完成但是当我运行它时,我只是得到所有带有计数的单词。