计算排序列表中单词的频率

时间:2014-10-15 18:07:04

标签: java loops exception frequency

public static void frequencyFinder() throws FileNotFoundException, IOException {
    String foldername = ".../Meta_Oct/separate";
    File folder = new File(foldername);
    File[] listOfFiles = folder.listFiles();


    String line;
    for (int x = 0; x < listOfFiles.length; x++) {
        BufferedReader in = new BufferedReader(new FileReader(listOfFiles[x]));
        String filename = listOfFiles[x].getName();
        String language = filename.split("@")[0];
        String target = filename.split("@")[1];
        String source = filename.split("@")[2];
        int frequency = 0;

        while ((line = in.readLine()) != null) {
            lemma_match = line.split(";")[3];
            frequency = 1;
            while((in.readLine().split(";")[3]).equals(lemma_match)){                 
                frequency++;
                line = in.readLine();                    
            }

            System.out.println(target + ":" + source +":"+lemma_match + ":" + frequency);
            frequency = 0;                
            lemma_match = null;
        }


    }
}

必须计算最后一列中单词的频率。问题是while循环跳过了一些行,它最终在NullPointerException s中,并且直到那一点都不计算所有频率。我已经在下面附加了堆栈跟踪以及示例文件。

EN;GOVERNMENT;DISEASE;bristle at 
EN;GOVERNMENT;DISEASE;contract 
EN;GOVERNMENT;DISEASE;detect in 
EN;GOVERNMENT;DISEASE;detect in 
EN;GOVERNMENT;DISEASE;immunize against 
EN;GOVERNMENT;DISEASE;inherit from 
EN;GOVERNMENT;DISEASE;spread 
EN;GOVERNMENT;DISEASE;spread 
EN;GOVERNMENT;DISEASE;spread 
EN;GOVERNMENT;DISEASE;stave off 
EN;GOVERNMENT;DISEASE;stave off 
EN;GOVERNMENT;DISEASE;transmit 
EN;GOVERNMENT;DISEASE;treat 
EN;GOVERNMENT;DISEASE;treat 
EN;GOVERNMENT;DISEASE;treat as 
EN;GOVERNMENT;DISEASE;treat by 
EN;GOVERNMENT;DISEASE;ward off 

STACK TRACE:

GOVERNMENT:DISEASE:bristle at :1
GOVERNMENT:DISEASE:detect in :2
GOVERNMENT:DISEASE:spread :2
GOVERNMENT:DISEASE:stave off :1
Exception in thread "main" java.lang.NullPointerException
GOVERNMENT:DISEASE:treat :2
    at javaapplication6.FrequencyFinder.frequencyFinder(FrequencyFinder.java:53)
    at javaapplication6.FrequencyFinder.main(FrequencyFinder.java:26)
Java Result: 1

2 个答案:

答案 0 :(得分:1)

以下代码存在问题:

    while ((line = in.readLine()) != null) { // here you read a line
        lemma_match = line.split(";")[3];
        frequency = 1;
        while((in.readLine().split(";")[3]).equals(lemma_match)){ // here you read
                                                                  // another line
            frequency++;
            line = in.readLine(); // here you read another line                   
        }

由于您在此代码中的3个位置读取了新行,因此不会增加所有这些读取的频率。例如,在内循环的每次迭代中,您正在读取两行,但只增加frequency一次。即使你修复了内部循环,当内部while循环结束而外部while循环读取一个新行时,你仍然会遗漏一些行。

此外,内部while循环将为您提供NullPointerException,因为您在尝试in.readLine() != null之前未检查split

现在让我们看看我们如何通过一个循环来实现这一目标:

    String lemma_match = "";
    while ((line = in.readLine()) != null) {
        String new_lemma_match = line.split(";")[3];
        if (!lemma_match.equals(new_lemma_match)) { // start count for a new lemma
            if (!lemma_match.equals("")) {
                System.out.println(target + ":" + source +":"+lemma_match + ":" + frequency);
            }
            lemma_match=new_lemma_match;
            frequency = 1; // initialize frequency for new lemma
        } else {
            frequency++; // increase frequency for current lemma
        }
    }

答案 1 :(得分:0)

继续在hashmap中添加条目。为每个唯一条目(键)增加值。最后你会得到你的结果。