Question

我正在编写一个Java程序，它解析文本文件中的所有单词，然后将它们添加到HashMap中。我需要计算文件中包含多少个不同的单词。我还需要弄清楚计算得最多的单词。 HashMap由映射到整数的每个单词组成，该整数表示单词出现的次数。

是否有类似HashMap的东西可以帮助我对此进行排序？

Answer 1

手动方式如下：

使用word和count字段创建复合WordCount类。
为该类创建一个按计数排序的比较器。
填写完HashMap后，创建一个从HashMap中的值创建的新WordCount对象列表。
使用比较器对列表进行排序。

Answer 2

您可以使用google-collections中的HashMultiset：

import com.google.common.collect.*;
import com.google.common.collect.Multiset.Entry;

...

  final Multiset<String> words = HashMultiset.create();
  words.addAll(...);

  Ordering<Entry<String>> byIncreasingCount = new Ordering<Entry<String>>() {
    @Override public int compare(Entry<String> a, Entry<String> b) {
      // safe because count is never negative
      return left.getCount() - right.getCount();
    }
  });

  Entry<String> maxEntry = byIncreasingCount.max(words.entrySet())
  return maxEntry.getElement();

编辑：哎呀，我以为你只想要一个最常用的单词。但听起来你想要几个最常见的 - 所以，你可以用max替换sortedCopy，现在你已经按顺序列出了所有条目。

要查找不同字词的数量：words.elementSet().size()

Answer 3

如果要按字对Map进行排序，那么TreeMap就是Java内置的答案。您可以确保Word对象是可比较的，也可以提供自定义比较器。

SortedMap<Word,Integer> map = new TreeMap<Word,Integer>();
...
for all words {
    Integer count = map.get(word);
    if (count == null ) count = 0;
    map.put(word, count+1);
}

如果您想按频率排序，那么在计算完所有单词后，您最好这样做。排序的集合不需要通过外部更改搞砸他们的订单。按频率排序需要复合词+计数对象，就像其他人发布的一样。

Answer 4

以下是这个问题最受欢迎的答案的Groovy版本：

List leastCommon(Multiset myMultiset, Integer quantity)
{

    Ordering<Multiset.Entry<String>> byIncreasingCount = new Ordering<Multiset.Entry<String>>() {
      @Override public int compare(Multiset.Entry<String> a, Multiset.Entry<String> b) {
          return a.getCount() - b.getCount() }
    }

    maxIndex = Math.min(quantity, myMultiset.entrySet().size() - 1)
    return byIncreasingCount.sortedCopy(myMultiset.entrySet()).subList(0, maxIndex)

}

List mostCommon(Multiset myMultiset, Integer quantity)
{

    Ordering<Multiset.Entry<String>> byDecreasingCount = new Ordering<Multiset.Entry<String>>() {
      @Override public int compare(Multiset.Entry<String> a, Multiset.Entry<String> b) {
          return b.getCount() - a.getCount() }
    }

    maxIndex = Math.min(quantity, myMultiset.entrySet().size() - 1)
    return byDecreasingCount.sortedCopy(myMultiset.entrySet()).subList(0, maxIndex)

}

Answer 5

看起来commons collections库中的TreeBag类可能会执行您想要的操作。它会跟踪对象添加到对象的副本数量，并按计数的升序对其进行排序。要获得最高计数项，只需调用last()方法即可。有一点需要注意的是，commons集合的东西还没有更新到使用泛型，所以你可能会收到很多使用它的编译器警告。

Answer 6

对于计数，填写Set中的单词并在完成后计算大小。

对于最高值，迭代所有条目并保持具有最高值的键。

Answer 7

你检查过java.util.PriorityQueue吗？ PriorityQueue基本上是一个列表，其优先级映射到每个元素（由非同步的优先级堆实现）。每次读入新字符串时，如果已经存在，则可以将其添加或将其优先级提高1（对数时间）。现有检查是线性时间，最后这将非常容易使用。为了得到频率最高的数字，只要你完成就可以使用poll（）！

编辑标准PriorityQueue不允许您直接编辑优先级，因为它需要比较器。使用简单的Hash实现或like this

更好

Answer 8

YourBean implements Comparable<YourBean>
方法compareTo：按字数排序
TreeMap而不是hashmap

像HashMap这样的东西，但排序？

8 个答案: