合并两个读取两个文件的方法,用一个方法读取一个文件

时间:2014-02-27 19:08:32

标签: java refactoring

我正在用Java创建一个贝叶斯过滤系统。目前,我的代码通过使用单独的.txt文件来学习垃圾邮件和良好的文本; learn.spam("spam.txt");learn.good("good.txt")

这两种方法几乎完全相同:

public void good(String file) throws IOException {
        A2ZFileReader fr = new A2ZFileReader(file);


        String content = fr.getContent();
        String[] tokens = content.split(splitregex);
        int goodTotal = 0;


        for (int i = 0; i < tokens.length; i++) {
            String word = tokens[i].toLowerCase();
            Matcher m = wordregex.matcher(word);
            if (m.matches()) {
                goodTotal++;
                if (words.containsKey(word)) {
                    Word w = (Word) words.get(word);
                    w.countGood();
                } else {
                    Word w = new Word(word);
                    w.countGood();
                    words.put(word,w);
                }
            }
        }

public void spam(String file) throws IOException {
    A2ZFileReader fr = new A2ZFileReader(file);

    String content = fr.getContent();
    String[] tokens = content.split(splitregex);
    int spamTotal = 0;//tokenizer.countTokens();

    for (int i = 0; i < tokens.length; i++) {
        String word = tokens[i].toLowerCase();
        Matcher m = wordregex.matcher(word);
        if (m.matches()) {
            spamTotal++;
            if (words.containsKey(word)) {
                Word w = (Word) words.get(word);
                w.countBad();
            } else {
                Word w = new Word(word);
                w.countBad();
                words.put(word,w);
            }
        }
    }

    Iterator iterator = words.values().iterator();
    while (iterator.hasNext()) {
        Word word = (Word) iterator.next();
        word.calcBadProb(spamTotal);
    }
}

现在我要解决的问题是,我有以下两个.txt文件而不是:{/ p>

spam    Gamble tonight only for a cheap price of $5 per hand.

ham     Sex, I love it. I need it now.

ham     yeah I know, I am going tonight that that place, ;) Come join me. You know you want to

ham     It is pretty expensive, just this and that for only ($900)

spam    Call 123123123 to use for free porn

邮件每行只有一封,垃圾邮件以垃圾邮件开头,好消息以火腿开头,带有一个标签。 如何更改方法以便我只使用一种方法和一个.txt文件来训练它。

1 个答案:

答案 0 :(得分:1)

good方法更改为:

public void good(String content) {
    String[] tokens = content.split(splitregex);
    int goodTotal = 0;


    for (int i = 0; i < tokens.length; i++) {
        String word = tokens[i].toLowerCase();
        Matcher m = wordregex.matcher(word);
        if (m.matches()) {
            goodTotal++;
            if (words.containsKey(word)) {
                Word w = (Word) words.get(word);
                w.countGood();
            } else {
                Word w = new Word(word);
                w.countGood();
                words.put(word,w);
            }
        }
    }
}

spam做几乎完全相同的事情。

然后编写一个方法train来读取文件,将其拆分成行,然后根据每行中的第一个单词调用正确的方法。

在那之后,将所有内容合并到一个方法中是微不足道的。