如何在Hadoop中输出前100个结果

时间:2014-12-21 19:47:58

标签: hadoop

对于所有这些hadoop的东西真的很新,我所拥有的只是Wordcount.java 归档并运行。 我只是想知道如何输出前100名 文件中最常见的单词。

我尝试过使用Treemap,但我还是不明白。 有人可以给我一个解决方案并解释它 请和我一起来。

   public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
  private TreeMap<Integer, Text> words = new TreeMap<Integer, Text>();

  private final static IntWritable one = new IntWritable(1);
  private Text word = new Text();

  public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {

String v[] = value.toString().split("\t");
Int count = Integer.parseInt(v[1]);

words.put(count, value);

if (words.size() > 100) {
     words.remove(words.firstKey());
        }

protected void cleanup(Context context)
                      throws IOException, InterruptedException {

        for ( Text wrd : words.values() ) {
            context.write(NullWritable.get(), wrd);
        }
    }

  }
}

 public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {

  public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
    TreeMap<Integer, Text> words = new TreeMap< Integer, Text>();

     for (Text value : values) {
            String v[] = value.toString().split("\t");
            Int count = Integer.parseInt(v[1]);

            words.put(count, value);

            if (words.size() > 100) {
        words.remove(words.firstKey());
            }
        }

    for (Text t : words.values()) {
            context.write(NullWritable.get(), t);
        }
     }
}

我甚至不知道这个代码是否正确完成但是当我运行它时,我只是得到所有带有计数的单词。

0 个答案:

没有答案