我一直在寻找文本文件中最常见的25个单词。
我有一个模糊的想法,如何使用TreeMap
,但我不确定
public static String CommonElements(WordStream words){
TreeMap<String, Integer> Map = new TreeMap<String, Integer>();
for(String w: words){
w = w.toLowerCase();
int token = Map.get(w);
if(token != 0){
Map.put(w,token);
}
}
}
该方法假设返回文本文件中最常见的25个单词的列表。
答案 0 :(得分:1)
字符串Stackoverflow could help you. Help Help at Stackoverflow.
import java.util.regex.Pattern;
import java.util.stream.Collectors.*;
import java.util.stream.*;
import java.util.HashMap;
import java.util.*;
import java.util.Map.Entry;
public class WordCount {
public static void main(String[] args) {
String sentence = "Stackoverflow could help you. Help Help at Stackoverflow.";
Stream<String> wordStream = Pattern.compile("\\W").splitAsStream(sentence);
HashMap<String,Integer> unsortedMap = new HashMap<String,Integer>();
// foreach word count how many the word occurs in the wordstream
wordStream.forEach((wordReal) -> {
String word = wordReal.toLowerCase();
if (!word.equals("")) {
if (unsortedMap.get(word) == null) {
unsortedMap.put(word, 0);
}
unsortedMap.put(word, unsortedMap.get(word) + 1);
}
});
// sort hashmap after value desc
Map<String, Integer> sortedMap =
unsortedMap.entrySet().stream()
.sorted(Map.Entry.comparingByValue((v1,v2)->v2.compareTo(v1)))
.collect(Collectors.toMap(Entry::getKey, Entry::getValue,
(e1, e2) -> e1, LinkedHashMap::new));
// just println word and wordcount, here you can limit to 25 (just delete)
for (Map.Entry<String, Integer> entry : sortedMap.entrySet()) {
System.out.println("Word : `" + entry.getKey() + "` Count : " + entry.getValue());
}
}
}
Word : `help` Count : 3
Word : `stackoverflow` Count : 2
Word : `at` Count : 1
Word : `could` Count : 1
Word : `you` Count : 1
如果你想只有25个结果,你只需要在25个结果后限制输出,或者只删除25个结果后的所有条目。
答案 1 :(得分:0)
你可以使用数组数组,为什么因为它允许你根据计数进行排序,你的子数组将包含两个对象
你的话
你的出生次数
//Sample object
commonWords = [["most used word",20],["second word",15],...and so on]