PDFParse与Apache Tika打印作者,标题等

时间:2015-10-28 17:15:03

标签: apache-tika

public static void main(String[] args) throws FileNotFoundException, IOException, TikaException, SAXException {
        // TODO code application logic here
        InputStream input = new FileInputStream("/home/alican/Downloads/solr-4.10.2/example/solr/senior/solr-word.pdf");
        ContentHandler handler = new BodyContentHandler();
        Metadata metadata = new Metadata();
        new PDFParser().parse(input, handler, metadata, new ParseContext());
        String plainText = handler.toString();
        System.out.println(handler.toString());
        System.out.println(metadata.toString());
    }

我可以打印PDF和元数据信息的内容。当l打印metadata.toString()输出就像

access_permission:extract_for_accessibility=true meta:save-date=2008-11-13T13:35:51Z dc:subject=solr, word, pdf subject=solr word dcterms:created=2008-11-13T13:35:51Z Author=Grant Ingersoll date=2008-11-13T13:35:51Z 
.....(so on)

如何只选择作者,标题和页码?

编辑:解决方案:

String[] author = metadata.getValues(Metadata.AUTHOR);
        System.out.println(Arrays.toString(author));

0 个答案:

没有答案