从PDF解析文本Java

时间:2016-02-10 23:48:04

标签: java sqlite parsing pdf

我想问一下,我该如何解析文本。我已经将PDF文件中的文本与PDFBox一起提取到普通文本中,然后在控制台中输出。例如这一个:

SHA256: 51c11994540537b633cf91b276b3c34556695ed870a5d3f7451e993262a4a745
File name: ACleaner.zip
Detection ratio: 0 / 55
Analysis date: 2015­07­21 12:23:19 UTC ( 8 minutes ago )
0 0
? Analysis ? File detail ? Additional information ? Comments  0 ? Votes
MD5  fffa183f43766ed39d411cb5f48dbc87
SHA1  b0d40fbc6c722d59031bb488455f89ba086eacd9
SHA256  51c11994540537b633cf91b276b3c34556695ed870a5d3f7451e993262a4a745

我需要获取一些值,例如MD5的值,文件名等等。我可以用Java来实现它吗?非常感谢

我试过了:在这个时候,我添加了这个

String keySHA256 = "SHA256:";
private static String SHA256Value = null;

if (line.contains(keySHA256)) {
    //  System.out.println(line);
    int length = keySHA256.length();
    SHA256Value = line.substring(length);
    System.out.println("SHA256 >>>>" + SHA256Value);
}

但有时它没有得到正确的价值..请帮助..

1 个答案:

答案 0 :(得分:2)

这可能是您开始学习Java IO和String解析的更好例子。谷歌是你的朋友。

//uri where your file is 
String fileName = "c://lines.txt";
// read the file into a buffered reader
try (BufferedReader br = new BufferedReader(new FileReader(fileName))) {

    String line;
    while ((line = br.readLine()) != null) { //iterate on each line of the file
        System.out.println(line); // print it if you want 
        String[] split=line.split(" "); // split your line into array of strings, each one is a separate word that has no spaces in it.
        //add any checks or extra processes here 
    }

} catch (IOException e) {
    e.printStackTrace();
}