Question

我需要在java中创建一个单词计数器来计算文件中相同单词的数量。例如，如果文件中的句子是 “一只鸟和一只兔子在吃午餐这只鸟喜欢苹果，兔子喜欢草莓。他们分享了午餐，都喜欢苹果和草莓。“ 然后计数器将找到并输出每个单词的使用量输出将是

A：2
AND：4
APPLES：2
BIRD：2
两种：1 吃饭：1 等...........

我对Java知之甚少，但我需要这样做。有没有人对如何做到这一点有任何想法？

Answer 1

首先，您应该考虑一下文件大小。如果它是一个大文件，关于使用地图的想法可能会导致内存问题。你必须检查不同的方法。其次要考虑一下你打算学习的java版本。这可以简化一些步骤。由于您是一名学习者，您可以使用大多数评论中所述的Map来遵循一个简单的算法。

Algorthim和代码段

逐行阅读文件：您可以通过多种方式实现此目的。通过使用标准的java类，如Scanner，BufferredReader等，或者使用第三方库，如appache-commons。或者更好的是，如果您使用的是JDK 7，则可以使用Files类作为

List<String> list = Files.readAllLines(new File("test.txt").toPath());
遍历行列表并获取单词：只需使用for循环。再次，如果您使用JDk 5+，for-each循环是您最好的appracoh。使用String类的split方法逐字逐句获取，并将其迭代为

for(String line : list){ for(String word : line.split(" ")){ //More code } }
添加到地图：使用键作为单词和值来维护一个地图作为计数。然后用每个单词检查它是否在地图中。如果它在map中，则获取计数并增加它，否则将count添加到count中。重复此过程，直到步骤2中的2 for循环完成。 //Intialize outside the loops

Map<String, Integer> counter = new HashMap<String, Integer>();

//Inside loop

Integer val = counter.get(word); if(val == null){ counter.put(word, 1); } else { counter.put(word, ++val); }
使用for-each循环和keySet再次打印值

for(String key : counter.keySet()){ System.out.println(key + " : " + counter.get(key)); }

但是在步骤3中，您需要考虑密钥的区分大小写。如果您需要不区分大小写的比较，请使用word.toUpperCase（）方法插入和搜索键。

全部放在一起

import java.io.File;
import java.nio.file.Files;
import java.util.HashMap;
import java.util.List;
import java.util.Map;


public class WordCounter {

    public static void main(String[] args) throws Exception {
        Map<String, Integer> counter = new HashMap<String, Integer>();

        List<String> list = Files.readAllLines(new File("test.txt").toPath());
        for(String line : list){
            for(String word : line.split(" ")){
                Integer val = counter.get(word);
                if(val == null){
                    counter.put(word, 1);
                } else {
                    counter.put(word, ++val);
                }
            }
        }

        for(String key : counter.keySet()){
            System.out.println(key + " : " + counter.get(key));
        }
    }

}

对于学习，你可以玩这个来找到独特的单词或重复的单词或单词，重复超过3次等等。快乐编码：）

Answer 2

这是完整的工作解决方案，你可以使用它作为参考，我添加了注释，使程序更容易理解： -

import java.io.BufferedReader; //imports
import java.io.FileReader;
import java.util.*;

public class Test {

    public static void main(String args[]) {

        String name[] = null; //array to store each individual word in the file -separated by whitespace

        try { //code to read input from file
            BufferedReader br = new BufferedReader(
                    new FileReader("D:\\file.txt")); //Enter your complete file path here
            StringBuilder sb = new StringBuilder();
            String line = br.readLine();

            while (line != null) {
                sb.append(line);  //read through the entire file
                line = br.readLine();
            }
            String everything = sb.toString();  //append everything to a single string
            name = everything.trim().split(" "); //split the entire string in terms of whitespace char
            br.close();  //close the BufferedReader
        } catch (Exception e) {

            e.printStackTrace();
        }

        Map<String, Integer> map = new HashMap<String, Integer>(); //Map to store the (word, No of
                                                                      // occurences of that word)
                                                                      //word is key and occurences is value
        Set<String> set = new HashSet<String>(); //set to eliminate duplicate words and store unique words
        int counter = 0; //main counter that counts occurences
        int i = 0;

        try {
            for (i = 0; i < name.length; i++) {
                set.add(name[i].toLowerCase()); //populate set from name array
            }

            System.out.println(set); output set

            Iterator it = set.iterator(); //iterate the entire set matching each unique word to the entire no of words stored in the name array
            while (it.hasNext()) {

                String temp = (String) it.next();
// for each word iterate the name array and look for matches, initialize counter to zero for each word 
                for (i = 0, counter = 0; i < name.length; i++) {

                    if (temp.equalsIgnoreCase(name[i])) {//condition to check if the word in set matches word in name array

                        counter = counter + 1; // increase occurence counter if it does
                        if (map.containsKey(temp)) {//if word is already inserted in the map then remove and insert it again with updated counter

                            map.remove(temp); 
                            map.put(temp, counter);

                        } else {

                            map.put(temp, counter);// if it is the first time entering the word in map simply enter with current counter

                        }

                    }

                }

            }
            System.out.println(map); // print the map
        } catch (Exception e) {

            e.printStackTrace();
        }
    }
}

Java计数器中文件中的单词数量

2 个答案: