从java中的一大块文本中获取前5个单词

时间:2015-11-19 21:54:21

标签: java dictionary text

我正试图从一大块文本中获取前五个用过的单词。我已经建立了一个单词地图,其中包含该单词使用次数的值。

Map<String,Integer> wordHits = new HashMap<String,Integer>();

for(Status status3 : statuses){

    String mdry = status3.getText();
    String[] statusSplitOnSpace = mdry.split(" ");

    for(String wordInStatus : statusSplitOnSpace){
        for(String str : statusSplitOnSpace){
                if(doesListContainWord(str)){
                incrementKeyofWordInList(str);
            }else{
                if(doesWordCountAsAWord(str)){
                    addNewWordToList(str);
                }
            }
        }
    }

Set keys = list.keySet();
for (Iterator i = keys.iterator(); i.hasNext() ;){
      String key = (String) i.next();
      String value = (String) list.get(key);
      //if(value.length()>10)
      System.out.println("Word (" + key + ") was found " + value + " times.");
      //else{
}

2 个答案:

答案 0 :(得分:1)

假设您将单词存储在数组中,首先我将单词转移到Map。我相信你试图这样做,但很难用你的变量名来判断。执行此操作后,您可以创建自定义Comparator,以便对Map进行排序。你可以这样做:

 public class Solution {           
    public static void main(String[] args){
        String[] words = {"word1", "word1", "word2", "word3", "word4", "word5", "word5"};
        Map<String, Integer> wordCounts = new HashMap<>();
        for (String word : words){ //Transfer your words to a map
            if (wordCounts.containsKey(word)){ //If word is already in map, increase value
                wordCounts.put(word, wordCounts.get(word)+1);
            }else{ //If word is not in map, add it to the map
                wordCounts.put(word, 1);
            }
        }
        TreeMap<String, Integer> sortedWordCounts = new TreeMap<>(new ValueComparator(wordCounts));  //Sorts based off of counts
        sortedWordCounts.putAll(wordCounts); //Add to new map
        NavigableSet<String> keys = sortedWordCounts.descendingKeySet();
        for (int i=0; i<5; i++){
            System.out.println(keys.pollLast());  //This prints out the top 5 keys. 
        }
    }
}
class ValueComparator implements Comparator<String>{
    private Map<String,Integer> map;
    public ValueComparator(Map<String,Integer> map){
        this.map = map;
    }
    @Override
    public int compare(String o1, String o2) {
        if (map.get(o1)>=map.get(o2)){
            return -1;
        }else{
            return 1;
        }
    }

}

输出

word5
word1
word4
word3
word2

TreeMap类型为Map,但会根据您初始化的Comparator对地图进行排序。如果你没有给它Comparator,它只会按键排序,我们不希望这样。我们希望按值排序,因此您必须编写自己的Comparator

答案 1 :(得分:1)

这是一个更新手的水平&#34;手册&#34;做法。我没有测试它,但它必须接近......

        // Get sorted Lists of words and counts from the source Map
    List<String> sortedWordsList = new ArrayList<String>();
    List<Integer> sortedCountsList = new ArrayList<Integer>();              
    for( String word : wordCountMap.keySet() ) 
    {
        Integer wordCount = wordCountMap.get(word);

        int insertIndex=0;
        for( int i=0; i != sortedCountsList.size(); ++i )
        {
            if( wordCount > sortedCountsList.get(i) ) break;
            ++insertIndex;  
        }     
        sortedWordsList.add( insertIndex, word );
        sortedCountsList.add( insertIndex, wordCount );
    }

    // Move top 5 words into a new List
    final int TOP_WORDS_TO_FIND_COUNT = 5;        
    List<String> topWordsList = new ArrayList<String>();
    for( int i=0; i != sortedWordsList.size(); ++i )
    {
        topWordsList.add( i, sortedWordsList.get(i) );
        if( i == TOP_WORDS_TO_FIND_COUNT-1 ) break;
    }     

    // Move top 5 counts into a new List
    List<Integer> topCountsList = new ArrayList<Integer>();
    for( int i=0; i != sortedCountsList.size(); ++i )
    {
        topCountsList.add( i, sortedCountsList.get(i) );
        if( i == TOP_WORDS_TO_FIND_COUNT-1 ) break;
    }