我对如何从任何文本文件输入中排序单词的频率感到困惑。我能够编码每个代码的频率,但我不确定如何从最常出现的单词中对它们进行排序。
这就是我对频率的看法。
例如。我有一个阅读的文件文本。
This is a test run.
I come with no wrapping or pretty pink bows.
I am who I am from my head to my toes.
I tend to get loud when speaking my mind.
Even a little crazy some of the time.
I. am. who. I. am.
此代码的输出为。
2 a
2 am
2 am.
1 bows.
1 come
1 crazy
1 even
1 from
1 get
1 head
4 i
2 i.
1 is
1 little
1 loud
1 mind.
3 my
1 no
1 of
1 or
1 pink
1 pretty
1 run.
1 some
1 speaking
1 tend
1 test
1 the
1 this
1 time.
2 to
1 toes.
1 when
1 who
1 who.
1 with
1 wrapping
我也想知道如何忽略这段时间,因为有些相同的词因为这段时间而被忽略了。
Scanner input;
try {
input = new Scanner(file);
Map<String, Integer> wordCounts = new TreeMap<String, Integer>();
while (input.hasNext()) {
String next = input.next().toLowerCase();
if (!wordCounts.containsKey(next)) {
wordCounts.put(next, 1);
} else {
wordCounts.put(next, wordCounts.get(next) + 1);
}
}
// report frequencies
for (String word : wordCounts.keySet()) {
int count = wordCounts.get(word);
System.out.println(count + "\t" + word);
}
} catch (FileNotFoundException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
答案 0 :(得分:0)
删除&#34;。&#34;你可以做到以下几点:
final String cleaned = next.replace(".", "");