Question

我无法弄清楚如何为程序找到最常用的单词和最常见的不区分大小写的单词。我有一个扫描仪，读取文本文件和一个while循环，但仍然不知道如何实现我想要找到的东西。我是否使用不同的字符串函数来读取和打印单词？

以下是我的代码：

public class letters {
public static void main(String[] args) throws FileNotFoundException {
    FileInputStream fis = new FileInputStream("input.txt");
    Scanner scanner = new Scanner(fis);
    String word[] = new String[500];
    while (scanner.hasNextLine()) {
        String s = scanner.nextLine();
        for (int i = 0; i < s.length(); i++) {
            char ch = s.charAt(i);
             }

          }
      String []roll = s.split("\\s");
       for(int i=0;i<roll.length;i++){
           String lin = roll[i];
           //System.out.println(lin);
      }
 }

这是我到目前为止所拥有的。我需要输出说：

   Word:
   6 roll

  Case-insensitive word:
  18 roll

这是我的输入文件：

@
roll tide roll!
Roll Tide Roll!
ROLL TIDE ROLL!
ROll tIDE ROll!
 roll  tide  roll! 
 Roll  Tide  Roll! 
 ROLL  TIDE  ROLL! 
   roll    tide    roll!   
    Roll Tide Roll  !   
@
65-43+21= 43
65.0-43.0+21.0= 43.0
 65 -43 +21 = 43 
 65.0 -43.0 +21.0 = 43.0 
 65 - 43 + 21 = 43 
 65.00 - 43.0 + 21.000 = +0043.0000 
    65   -  43  +   21  =   43

我只是需要它来找到最常出现的单词（这是最大的连续字母序列）（即滚动）并打印出它所定位的次数（即6）。如果有人可以帮助我，那真的很棒！感谢

Answer 1

考虑使用Map<String,Integer>这个词，然后你可以实现这个来计算单词，并且可以用于任意数量的单词。 See Documentation for Map

像这样（需要修改不区分大小写）

public Map<String,Integer> words_count = new HashMap<String,Integer>();

//read your line (you will have to determine if this line should be split or is equations
//also just noticed that the trailing '!' would need to be removed

String[] words = line.split("\\s+");
for(int i=0;i<words.length;i++)
{
     String s = words[i];
     if(words_count.ketSet().contains(s))
     {
          Integer count = words_count.get(s) + 1;
          words_count.put(s, count)
     }
     else
          words_count.put(s, 1)

}

然后你有字符串中每个单词的出现次数，并且发生最多的事情就像

Integer frequency = null;
String mostFrequent = null;
for(String s : words_count.ketSet())
{
    Integer i = words_count.get(s);
    if(frequency == null)
         frequency = i;
    if(i > frequency)
    {
         frequency = i;
         mostFrequent = s;
    }
}

然后打印

System.out.println("The word "+ mostFrequent +" occurred "+ frequency +" times");

Answer 2

首先将所有单词累积到Map中，如下所示：

...
String[] roll = s.split("\\s+");
for (final String word : roll) {
    Integer qty = words.get(word);
    if (qty == null) {
        qty = 1;
    } else {
        qty = qty + 1;
    }
    words.put(word, qty);
}
...

然后你需要找出哪个得分最高：

String bestWord;
int maxQty = 0;
for(final String word : words.keySet()) {
    if(words.get(word) > maxQty) {
        maxQty = words.get(word);
        bestWord = word;
    }
}
System.out.println("Word:");
System.out.println(Integer.toString(maxQty) + " " + bestWord);

最后你需要将同一个词的所有形式合并在一起：

Map<String, Integer> wordsNoCase = new HashMap<String, Integer>();
for(final String word : words.keySet()) {
    Integer qty = wordsNoCase.get(word.toLowerCase());
    if(qty == null) {
        qty = words.get(word);
    } else {
        qty += words.get(word);
    }
    wordsNoCase.put(word.toLowerCase(), qty);
}
words = wordsNoCase;

然后重新运行上一个代码段，找到得分最高的单词。

Answer 3

尝试使用HashMap获得更好的结果。您需要使用BufferedReader和Filereader来获取输入文件，如下所示：

FileReader text = new FileReader("file.txt");
BufferedReader textFile = new BufferedReader(text);

Bufferedreader对象textfile需要作为参数传递给下面的方法：

public HashMap<String, Integer> countWordFrequency(BufferedReader textFile) throws IOException
{
/*This method finds the frequency of words in a text file
 * and saves the word and its corresponding frequency in 
 * a HashMap.
 */
    HashMap<String, Integer> mapper = new HashMap<String, Integer>();
    StringBuffer multiLine = new StringBuffer("");
    String line = null;
    if(textFile.ready())
    {
        while((line = textFile.readLine()) != null)
        {
            multiLine.append(line);
            String[] words = line.replaceAll("[^a-zA-Z]", " ").toLowerCase().split(" ");
            for(String word : words)
            {
                if(!word.isEmpty())
                {
                    Integer freq = mapper.get(word);
                    if(freq == null)
                    {
                        mapper.put(word, 1);
                    }
                    else
                    {
                        mapper.put(word, freq+1);
                    }
                }
            }
        }
        textFile.close();
    }
    return mapper;
}

行line.replaceAll("[^a-zA-Z]", " ").toLowerCase().split(" ");用于替换字母以外的所有字符，它使所有单词都用小写（这解决了你的不区分大小写的问题），然后用空格分隔单词。

/*This method finds the highest value in HashMap
 * and returns the same.
 */
public int maxFrequency(HashMap<String, Integer> mapper)
{
    int maxValue = Integer.MIN_VALUE;
    for(int value : mapper.values())
    {
        if(value > maxValue)
        {
            maxValue = value;
        }
    }
    return maxValue;
}

上面的代码返回最高的hashmap中的值。

/*This method prints the HashMap Key with a particular Value.
 */
public void printWithValue(HashMap<String, Integer> mapper, Integer value)
{
    for (Entry<String, Integer> entry : mapper.entrySet()) 
    {
        if (entry.getValue().equals(value)) 
        {
            System.out.println("Word : " + entry.getKey() + " \nFrequency : " + entry.getValue());
        }
    }
}

现在您可以打印最频繁的单词及其频率，如上所示。

Answer 4

    /*  i have declared LinkedHashMap containing String as a key and occurrences as  a value.
     * Creating BufferedReader object
     * Reading the first line into currentLine
     * Declere while-loop & splitting the currentLine into words
     * iterated using for loop. Inside for loop, i have an if else statement
     * If word is present in Map increment it's count by 1 else set to 1 as value
     * Reading next line into currentLine
     */
    public static void main(String[] args) {

        Map<String, Integer> map = new LinkedHashMap<String, Integer>();

        BufferedReader reader = null;

        try {
            reader = new BufferedReader(new FileReader("F:\\chidanand\\javaIO\\Student.txt"));
              String currentLine = reader.readLine();
            while (currentLine!= null) {
                String[] input = currentLine.replaceAll("[^a-zA-Z]", " ").toLowerCase().split(" ");
                  for (int i = 0; i < input.length; i++) {
                    if (map.containsKey(input[i])) {
                        int count = map.get(input[i]);
                        map.put(input[i], count + 1);

                    } else {
                        map.put(input[i], 1);
                    }

                }
                   currentLine = reader.readLine();
            }

            String mostRepeatedWord = null;
             int count = 0;
                 for (Entry<String, Integer> m:map.entrySet())
                    {
                        if(m.getValue() > count)
                        {
                           mostRepeatedWord = m.getKey();

                            count = m.getValue();
                        }
                    }

                 System.out.println("The most repeated word in input file is : "+mostRepeatedWord);

                    System.out.println("Number Of Occurrences : "+count);

        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            try {
                reader.close();
            } catch (IOException e) {
                e.printStackTrace();
            }

        }

    }
}

如何在文本文件中找到单词并打印使用数组显示的最常用单词？

4 个答案: