彩色文字提取PDFBOX

时间:2013-07-10 11:33:50

标签: java fonts colors extract pdfbox

我试试这段代码

PDDocument doc = null;

    try 
    {
        doc = PDDocument.load("C:/Users/bcalvo/Desktop/leon/20130710.pdf");
        //doc = PDDocument.load("C:/Users/bcalvo/Desktop/color.pdf");
        PDFStreamEngine engine = new PDFStreamEngine(ResourceLoader.loadProperties("org/apache/pdfbox/resources/PageDrawer.properties"));
        PDPage page = (PDPage)doc.getDocumentCatalog().getAllPages().get(0);
        engine.processStream(page, page.findResources(), page.getContents().getStream());
        PDGraphicsState graphicState = engine.getGraphicsState();

        System.out.println("color: " + graphicState.getStrokingColor().getColorSpace().getName());

        //System.out.println("color: " + graphicState.getStrokingColor().getJavaColor() );

        float colorSpaceValues[] = graphicState.getStrokingColor().getColorSpaceValue();

        for (float c : colorSpaceValues) 
            System.out.println(c * 255);
    }
    finally 
    {
        if (doc != null) 
            doc.close();
    }       
}

当我执行它时,显示下一个错误:

jul 10,2013 1:23:31 PM org.apache.pdfbox.util.PDFStreamEngine processOperator警告:java.lang.ClassCastException:org.apache.pdfbox.util.PDFStreamEngine无法强制转换为org.apache.pdfbox。 pdfviewer.PageDrawer java.lang.ClassCastException:org.apache.pdfbox.util.PDFStreamEngine无法在org.apache.pdfbox.util.operator.pagedrawer.CurveTo.process(CurveTo。)中强制转换为org.apache.pdfbox.pdfviewer.PageDrawer。 java:45)在org.apache.pdfbox.util.PDFStreamEd位于com.prueba.ExtractColorFonts.main(ExtractColorFonts.java:26)的org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)上的.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)

它引用错误的行是:

engine.processStream(page,page.findResources(),page.getContents()。getStream());

有人知道如何修复此错误吗?

0 个答案:

没有答案