Question

考虑到我们有txt个文件，我们希望知道txt的每个字出现了多少次。我使用了以下代码，但它不起作用。它给出了所有值1。首先，我阅读txt文件并将每个单词写在一个单独的行中。同时，我把它们放在数组列表中。然后，我读取txt文件的第一行并获取数组列表的第一个元素，并与整个txt文件进行比较。如果有的话，将一个增加到一个显示出现次数的数组。然后获取第二个Array List项，依此类推，直到我们到达Array List的末尾。

 private static void count(String text) throws FileNotFoundException, IOException {

        FileOutputStream thewords=new FileOutputStream(Check);

         ArrayList<String> keyArrayList=new ArrayList<String>();
         int countWord=0;

        StringTokenizer tokenizer =new StringTokenizer(text) ;


         while(tokenizer.hasMoreTokens())
         {
             String nextWord=tokenizer.nextToken();
             keyArrayList.add(nextWord);
             thewords.write(nextWord.getBytes());
             thewords.write(System.getProperty("line.separator").getBytes());


             countWord++;
         }


         int[] numbOfOccurance=new int[countWord];

         BufferedReader br=new BufferedReader(new FileReader(Check));
         String readline;
         for(int loopIndex=0;loopIndex<countWord;loopIndex++)
         {
          readline=br.readLine();
          String test=keyArrayList.get(loopIndex);
            if(test.equals(readline))
            {
                numbOfOccurance[loopIndex]++;

            }

         }

Answer 1

您的方法非常慢，您必须搜索整个ArrayList，以便查找某个单词是否出现多次。

此外，不推荐使用StringTokenizer。

我建议采用以下方法：

import static java.util.function.Function.identity;
import static java.util.stream.Collectors.toMap;

public static void main(String[] args) throws Exception {
    final Path path = Paths.get("path", "to", "file");
    final Map<String, Integer> counts = countOccurrences(path);
}

private static Map<String, Integer> countOccurrences(Path path) throws IOException {
    final Pattern pattern = Pattern.compile("[^A-Za-z']+");
    try (final Stream<String> lines = Files.lines(path)) {
        return lines
                .flatMap(pattern::splitAsStream)
                .collect(toMap(identity(), w -> 1, Integer::sum));
    }
}

这使用Java 8 Stream API从文件中读取行。然后它会在[^A-Za-z']+上分割线条，即非单词，非撇号字符 - 使用flatMap创建Stream个别单词。

然后，我们为collect中的每个单词1使用Map到Map个单词。然后，我们使用合并函数Integer::sum将Map中的值添加到一起。

然后，您可以使用以下内容列出Map的内容，按事件排序：

counts.entrySet().stream()
        .sorted(Map.Entry.comparingByValue())
        .map(e -> String.format("%s -> %s", e.getKey(), e.getValue()))
        .forEach(System.out::println);

Answer 2

正如@Pratik首先指出的那样，这是HashMap的经典用法。您只需要一次浏览列表。

 HashMap<String, Integer> wordMap = new HashMap<String, Integer>();
 StringTokenizer tokenizer =new StringTokenizer(text) ;

 while(tokenizer.hasMoreTokens())
 {
     String nextWord=tokenizer.nextToken();
     Integer count = wordMap.get(nextWord); 
     if (count  == null){
        wordMap.put(nextWord, 1);
     }
     else{
         wordMap.put(nextWord, count + 1);
     }
 }

 //Print word count
 for (String key : wordMap.keySet()) {
    System.out.println(key + " count: " + wordMap.get(key));
 }

解决当前实施无效的原因：

我认为只使用数组来实现这一点是不可行的。使用当前代码，可以创建一个int数组，其大小为所有单词，而不是不同单词的大小。即使您使用ArrayList<Integer>为遇到的每个新单词动态添加新条目，您也需要循环遍历整个列表才能处理一个单词。另外，如何保持哪个单词对应于整数数组中哪个条目的映射？

计算文件中出现的单词数

2 个答案: