public static void frequencyFinder() throws FileNotFoundException, IOException {
String foldername = ".../Meta_Oct/separate";
File folder = new File(foldername);
File[] listOfFiles = folder.listFiles();
String line;
for (int x = 0; x < listOfFiles.length; x++) {
BufferedReader in = new BufferedReader(new FileReader(listOfFiles[x]));
String filename = listOfFiles[x].getName();
String language = filename.split("@")[0];
String target = filename.split("@")[1];
String source = filename.split("@")[2];
int frequency = 0;
while ((line = in.readLine()) != null) {
lemma_match = line.split(";")[3];
frequency = 1;
while((in.readLine().split(";")[3]).equals(lemma_match)){
frequency++;
line = in.readLine();
}
System.out.println(target + ":" + source +":"+lemma_match + ":" + frequency);
frequency = 0;
lemma_match = null;
}
}
}
必须计算最后一列中单词的频率。问题是while循环跳过了一些行,它最终在NullPointerException
s中,并且直到那一点都不计算所有频率。我已经在下面附加了堆栈跟踪以及示例文件。
EN;GOVERNMENT;DISEASE;bristle at
EN;GOVERNMENT;DISEASE;contract
EN;GOVERNMENT;DISEASE;detect in
EN;GOVERNMENT;DISEASE;detect in
EN;GOVERNMENT;DISEASE;immunize against
EN;GOVERNMENT;DISEASE;inherit from
EN;GOVERNMENT;DISEASE;spread
EN;GOVERNMENT;DISEASE;spread
EN;GOVERNMENT;DISEASE;spread
EN;GOVERNMENT;DISEASE;stave off
EN;GOVERNMENT;DISEASE;stave off
EN;GOVERNMENT;DISEASE;transmit
EN;GOVERNMENT;DISEASE;treat
EN;GOVERNMENT;DISEASE;treat
EN;GOVERNMENT;DISEASE;treat as
EN;GOVERNMENT;DISEASE;treat by
EN;GOVERNMENT;DISEASE;ward off
STACK TRACE:
GOVERNMENT:DISEASE:bristle at :1
GOVERNMENT:DISEASE:detect in :2
GOVERNMENT:DISEASE:spread :2
GOVERNMENT:DISEASE:stave off :1
Exception in thread "main" java.lang.NullPointerException
GOVERNMENT:DISEASE:treat :2
at javaapplication6.FrequencyFinder.frequencyFinder(FrequencyFinder.java:53)
at javaapplication6.FrequencyFinder.main(FrequencyFinder.java:26)
Java Result: 1
答案 0 :(得分:1)
以下代码存在问题:
while ((line = in.readLine()) != null) { // here you read a line
lemma_match = line.split(";")[3];
frequency = 1;
while((in.readLine().split(";")[3]).equals(lemma_match)){ // here you read
// another line
frequency++;
line = in.readLine(); // here you read another line
}
由于您在此代码中的3个位置读取了新行,因此不会增加所有这些读取的频率。例如,在内循环的每次迭代中,您正在读取两行,但只增加frequency
一次。即使你修复了内部循环,当内部while循环结束而外部while循环读取一个新行时,你仍然会遗漏一些行。
此外,内部while循环将为您提供NullPointerException
,因为您在尝试in.readLine() != null
之前未检查split
。
现在让我们看看我们如何通过一个循环来实现这一目标:
String lemma_match = "";
while ((line = in.readLine()) != null) {
String new_lemma_match = line.split(";")[3];
if (!lemma_match.equals(new_lemma_match)) { // start count for a new lemma
if (!lemma_match.equals("")) {
System.out.println(target + ":" + source +":"+lemma_match + ":" + frequency);
}
lemma_match=new_lemma_match;
frequency = 1; // initialize frequency for new lemma
} else {
frequency++; // increase frequency for current lemma
}
}
答案 1 :(得分:0)
继续在hashmap中添加条目。为每个唯一条目(键)增加值。最后你会得到你的结果。