不能用java中的pdfbox计数工程广告字符

时间:2018-11-23 04:59:29

标签: java pdfbox

class ReadPDF {


    public void Read() throws IOException {

        int amountOfWords = 0;
        int amountOfChars = 0;
        String sourceCode ="";

        try {
            PDDocument doc = PDDocument.load(new File("C:\\Users\\ccw\\Desktop\\articles\\RECYCLING-BEHAVIOUR-AMONG-MALAYSIAN-TERTIARY-STUDENTS.pdf"));
            String text = new PDFTextStripper().getText(doc);

            sourceCode = sourceCode.replace ("-", "").replace (".", "");

            while(doc!=null){
                String[] words = sourceCode.split(" ");
                amountOfWords = amountOfWords + words.length;
                for (String word : words) {
                    amountOfChars = amountOfChars + word.length();
                }
            }

            System.out.println("Amount of Chars is " + amountOfChars);
            System.out.println("Amount of Words is " + (amountOfWords + 1));
            System.out.println("Average Word Length is "+ (amountOfChars/amountOfWords));


        }catch (IOException e) {
            System.out.println(e);
        }

    }

}

我正在尝试通过使用pdfbox计算pdf文件中的所有单词和字符。 但是现在我得到一个错误,sourceCode没有初始化

1 个答案:

答案 0 :(得分:1)

将此行sourceCode = sourceCode.replace ("-", "").replace (".", "");替换为sourceCode = text.replace ("-", "").replace (".", "");并删除while循环