class ReadPDF {
public void Read() throws IOException {
int amountOfWords = 0;
int amountOfChars = 0;
String sourceCode ="";
try {
PDDocument doc = PDDocument.load(new File("C:\\Users\\ccw\\Desktop\\articles\\RECYCLING-BEHAVIOUR-AMONG-MALAYSIAN-TERTIARY-STUDENTS.pdf"));
String text = new PDFTextStripper().getText(doc);
sourceCode = sourceCode.replace ("-", "").replace (".", "");
while(doc!=null){
String[] words = sourceCode.split(" ");
amountOfWords = amountOfWords + words.length;
for (String word : words) {
amountOfChars = amountOfChars + word.length();
}
}
System.out.println("Amount of Chars is " + amountOfChars);
System.out.println("Amount of Words is " + (amountOfWords + 1));
System.out.println("Average Word Length is "+ (amountOfChars/amountOfWords));
}catch (IOException e) {
System.out.println(e);
}
}
}
我正在尝试通过使用pdfbox计算pdf文件中的所有单词和字符。 但是现在我得到一个错误,sourceCode没有初始化
答案 0 :(得分:1)
将此行sourceCode = sourceCode.replace ("-", "").replace (".", "");
替换为sourceCode = text.replace ("-", "").replace (".", "");
并删除while循环