Question

我对如何从任何文本文件输入中排序单词的频率感到困惑。我能够编码每个代码的频率，但我不确定如何从最常出现的单词中对它们进行排序。

这就是我对频率的看法。

例如。我有一个阅读的文件文本。

This is a test run. 
I come with no wrapping or pretty pink bows. 
I am who I am from my head to my toes. 
I tend to get loud when speaking my mind. 
Even a little crazy some of the time.
I. am. who. I. am.

此代码的输出为。

2   a
2   am
2   am.
1   bows.
1   come
1   crazy
1   even
1   from
1   get
1   head
4   i
2   i.
1   is
1   little
1   loud
1   mind.
3   my
1   no
1   of
1   or
1   pink
1   pretty
1   run.
1   some
1   speaking
1   tend
1   test
1   the
1   this
1   time.
2   to
1   toes.
1   when
1   who
1   who.
1   with
1   wrapping

我也想知道如何忽略这段时间，因为有些相同的词因为这段时间而被忽略了。

   Scanner input;
            try {
                input = new Scanner(file);
                 Map<String, Integer> wordCounts = new TreeMap<String, Integer>();
                    while (input.hasNext()) {
                        String next = input.next().toLowerCase();
                        if (!wordCounts.containsKey(next)) {
                            wordCounts.put(next, 1);
                        } else {
                            wordCounts.put(next, wordCounts.get(next) + 1);
                        }
                    }

                    //  report frequencies

                    for (String word : wordCounts.keySet()) {
                        int count = wordCounts.get(word);

                            System.out.println(count + "\t" + word);
                    }
            } catch (FileNotFoundException e1) {
                // TODO Auto-generated catch block
                e1.printStackTrace();
            }

Answer 1

删除＆＃34;。＆＃34;你可以做到以下几点：

final String cleaned = next.replace(".", "");

使用Map <string，integer =“”>从最高到最低

1 个答案: