将文本文件拆分为单词并计算每个单词java的出现次数

时间:2015-08-23 13:20:46

标签: java split

这是我的代码错误在哪里?
我知道每个单词出现一次

HashMap<String, WordData> Words = new HashMap<String, WordData>();
try {
    File f1 = new File(Path);
    Scanner scan1 = new Scanner(f1, "UTF-8");
    String word, line;
    WordData wordData;
    String[] wordsOfLine;
        while (scan1.hasNext()) {
            line = scan1.nextLine().trim();
            wordsOfLine = line.split("\\s");

            for (int i = 0; i < wordsOfLine.length&&wordsOfLine[i]!=""; i++) {

                word = wordsOfLine[i].trim();
                if (Words.get(word)==null){
                     wordData = new WordData(1, "");
                    Words.put(word, wordData);
                } else {
                    wordData = Words.get(word);
                    wordData.IncFreq();
                    Words.put(word, wordData);
                }
            }
        }
} catch (Exception ex) {

}

1 个答案:

答案 0 :(得分:0)

代码中的主要问题是第一次添加单词时,您设置了空白值。

class WordData {
    public int getFrequency() {
        return frequency;
    }

    public void setFrequency(int frequency) {
        this.frequency = frequency;
    }

    public String getWord() {
        return word;
    }

    public void setWord(String word) {
        this.word = word;
    }

    private int frequency=0;
    private String word;

    public WordData(int frequency, String word) {
        this.frequency = frequency;
        this.word = word;
    }

    public void increaseFrequency(){
        this.frequency =this.frequency+1;
    }
}

public class NoOfWordsInFile {
    public static void main( String args[]) {
        HashMap<String, WordData> Words = new HashMap<String, WordData>();
        try {
            File f1 = new File(filepath);
            Scanner scan1 = new Scanner(f1, "UTF-8");
            String word, line;
            WordData wordData;
            String[] wordsOfLine;
            while (scan1.hasNext()) {
                line = scan1.nextLine().trim();
                wordsOfLine = line.split("\\s");

                for (int i = 0; i < wordsOfLine.length&&wordsOfLine[i]!=""; i++) {

                    word = wordsOfLine[i].trim();
                    if (Words.get(word)==null){
                        wordData = new WordData(1, word);
                        Words.put(word, wordData);
                    } else {
                        wordData = Words.get(word);
                        wordData.increaseFrequency();
                        Words.put(word, wordData);
                    }
                }
            }
            for (WordData s : Words.values()){
                System.out.println(s.getWord() +": " + s.getFrequency());
            }

        } catch (Exception ex) {
            ex.printStackTrace();
        }
    }
}

另请注意,此代码区分大小写。 您需要特别注意使此案例不敏感。