如何使用iText

时间:2017-01-18 20:28:21

标签: java pdf itext

我试图用iText提取PDF中矩形的颜色。 以下是PDF页面的所有内容:

enter image description here

这是用iText提取的页面内容:

q
BT
36 806 Td
0 -18 Td
/F1 12 Tf
(Option 1:)Tj
0 0 Td
0 -94.31 Td
ET
Q
q
Q
q
2 J
0 G
0.5 w
88.3 693.69 139.47 94.31 re
S
0.5 w
227.77 693.69 139.47 94.31 re
S
0.5 w
367.23 693.69 139.47 94.31 re
S
Q
BT
1 0 0 1 90.3 774 Tm
/F1 12 Tf
(A rectangle:)Tj
ET
q 1.13 0 0 1.13 229.77 695.69 cm /Xf1 Do Q
BT
1 0 0 1 369.23 774 Tm
/F1 12 Tf
(The rectangle is scaled)Tj
1 0 0 1 369.23 762 Tm
(to fit inside the cell, you)Tj
1 0 0 1 369.23 750 Tm
(see a padding.)Tj
ET
228 810 m
338 810 l
S

但是,有些东西我无法从该代码中提取,我正在谈论红色,如果我生成相同的PDF但是使用其他颜色而不是红色,页面内容没有任何变化(上面显示的代码)。

所以,我的问题是,如何使用iText库中的某些方法或属性为Java提取该颜色。

我正在使用 iText 5.5.9 ,这是我用来生成PDF样本的代码示例:

感谢您提供的任何帮助!

这是我用来生成PDF的代码:

String dest = "C:\\TestCreation.pdf";
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(dest));
document.open();

document.add(new Paragraph("Option 1:"));
PdfPTable table = new PdfPTable(3);
table.addCell("A rectangle:");
PdfTemplate template = writer.getDirectContent().createTemplate(120, 80);
template.setColorFill(BaseColor.RED);
template.rectangle(0, 0, 120, 80);
template.fill();
writer.releaseTemplate(template);
table.addCell(Image.getInstance(template));
table.addCell("The rectangle is scaled to fit inside the cell, you see a padding.");
document.add(table);

PdfContentByte cb = writer.getDirectContent();
cb.moveTo(228, 810);
cb.lineTo(338, 810);
cb.stroke();
document.close();

你可以在这里看到PDF文件: PDF example

这是我用来获取网页内容的行代码: String pageContent = new String(reader.getPageContent(1));

我一直在检查所有读者对象,我能够找到矩形,但不能找到它的颜色:

enter image description here

2 个答案:

答案 0 :(得分:3)

要查找矩形的颜色,您可能需要浏览PDF流的/ Annots部分。在这里,您只是在探索/ Contents,它不包含Rect实体的颜色等信息。

我希望它会有所帮助:)

答案 1 :(得分:2)

您的代码显示了它,这是您创建矩形并添加它的方式:

PdfTemplate template = writer.getDirectContent().createTemplate(120, 80);
template.setColorFill(BaseColor.RED);
template.rectangle(0, 0, 120, 80);
template.fill();
writer.releaseTemplate(template);
table.addCell(Image.getInstance(template));

iText PdfTemplate生成PDF表单XObject。表单XObject依次是 PDF内容流,它是任何图形对象序列(包括路径对象,文本对象和采样图像)的独立描述(ISO 32000的8.10.1节) -1),即一个单独的绘图指令流,其内容可以从任何其他内容流中引用。

对于页面内容流,这是包含表单XObject的行:

q 1.13 0 0 1.13 229.77 695.69 cm /Xf1 Do Q

(转换矩阵被操纵以拉伸1.13并移动一点,然后绘制XObject Xf1 ,然后重置转换矩阵。)

XObject Xf1 的内容流是:

1 0 0 rg
0 0 120 80 re
f

即。它将非描边颜色设置为RGB红色,在原点定义一个120x80的矩形,然后填充它。

  

这是我用来获取网页内容的行代码:

String pageContent = new String(reader.getPageContent(1));

该行不足以获取所有内容细节:

  1. 它仅返回立即页面内容,但不返回XObjects表单中的详细说明以及直接内容中使用的模式。通常会找到PDF,其直接页面内容仅引用一个或多个表单XObjects。

  2. 尽管出现了页面内容具有二元性质,但不是文本内容。一旦使用了带有非标准编码的字体,PDF字符串内容在您的Java字符串中就会毫无意义,或者(根据您的标准编码)甚至会破坏。

  3. 相反,应该使用iText解析器框架,例如像这样:

    ExtRenderListener extRenderListener = new ExtRenderListener()
    {
        @Override
        public void beginTextBlock()                        {   }
        @Override
        public void renderText(TextRenderInfo renderInfo)   {   }
        @Override
        public void endTextBlock()                          {   }
        @Override
        public void renderImage(ImageRenderInfo renderInfo) {   }
    
        @Override
        public void modifyPath(PathConstructionRenderInfo renderInfo)
        {
            pathInfos.add(renderInfo);
        }
    
        @Override
        public Path renderPath(PathPaintingRenderInfo renderInfo)
        {
            GraphicsState graphicsState;
            try
            {
                graphicsState = getGraphicsState(renderInfo);
            }
            catch (NoSuchFieldException | SecurityException | IllegalArgumentException | IllegalAccessException e)
            {
                e.printStackTrace();
                return null;
            }
    
            Matrix ctm = graphicsState.getCtm();
    
            if ((renderInfo.getOperation() & PathPaintingRenderInfo.FILL) != 0)
            {
                System.out.printf("FILL (%s) ", toString(graphicsState.getFillColor()));
                if ((renderInfo.getOperation() & PathPaintingRenderInfo.STROKE) != 0)
                    System.out.print("and ");
            }
            if ((renderInfo.getOperation() & PathPaintingRenderInfo.STROKE) != 0)
            {
                System.out.printf("STROKE (%s) ", toString(graphicsState.getStrokeColor()));
            }
    
            System.out.print("the path ");
    
            for (PathConstructionRenderInfo pathConstructionRenderInfo : pathInfos)
            {
                switch (pathConstructionRenderInfo.getOperation())
                {
                case PathConstructionRenderInfo.MOVETO:
                    System.out.printf("move to %s ", transform(ctm, pathConstructionRenderInfo.getSegmentData()));
                    break;
                case PathConstructionRenderInfo.CLOSE:
                    System.out.printf("close %s ", transform(ctm, pathConstructionRenderInfo.getSegmentData()));
                    break;
                case PathConstructionRenderInfo.CURVE_123:
                    System.out.printf("curve123 %s ", transform(ctm, pathConstructionRenderInfo.getSegmentData()));
                    break;
                case PathConstructionRenderInfo.CURVE_13:
                    System.out.printf("curve13 %s ", transform(ctm, pathConstructionRenderInfo.getSegmentData()));
                    break;
                case PathConstructionRenderInfo.CURVE_23:
                    System.out.printf("curve23 %s ", transform(ctm, pathConstructionRenderInfo.getSegmentData()));
                    break;
                case PathConstructionRenderInfo.LINETO:
                    System.out.printf("line to %s ", transform(ctm, pathConstructionRenderInfo.getSegmentData()));
                    break;
                case PathConstructionRenderInfo.RECT:
                    System.out.printf("rectangle %s ", transform(ctm, expandRectangleCoordinates(pathConstructionRenderInfo.getSegmentData())));
                    break;
                }
            }
            System.out.println();
    
            pathInfos.clear();
            return null;
        }
    
        @Override
        public void clipPath(int rule)
        {
        }
    
        List<Float> transform(Matrix ctm, List<Float> coordinates)
        {
            List<Float> result = new ArrayList<>();
            for (int i = 0; i + 1 < coordinates.size(); i += 2)
            {
                Vector vector = new Vector(coordinates.get(i), coordinates.get(i + 1), 1);
                vector = vector.cross(ctm);
                result.add(vector.get(Vector.I1));
                result.add(vector.get(Vector.I2));
            }
            return result;
        }
    
        List<Float> expandRectangleCoordinates(List<Float> rectangle)
        {
            if (rectangle.size() < 4)
                return Collections.emptyList();
            return Arrays.asList(
                    rectangle.get(0), rectangle.get(1),
                    rectangle.get(0) + rectangle.get(2), rectangle.get(1),
                    rectangle.get(0) + rectangle.get(2), rectangle.get(1) + rectangle.get(3),
                    rectangle.get(0), rectangle.get(1) + rectangle.get(3)
                    );
        }
    
        String toString(BaseColor baseColor)
        {
            if (baseColor == null)
                return "DEFAULT";
            return String.format("%s,%s,%s", baseColor.getRed(), baseColor.getGreen(), baseColor.getBlue());
        }
    
        GraphicsState getGraphicsState(PathPaintingRenderInfo renderInfo) throws NoSuchFieldException, SecurityException, IllegalArgumentException, IllegalAccessException
        {
            Field gsField = PathPaintingRenderInfo.class.getDeclaredField("gs");
            gsField.setAccessible(true);
            return (GraphicsState) gsField.get(renderInfo);
        }
    
        final List<PathConstructionRenderInfo> pathInfos = new ArrayList<>();
    };
    
    try (   InputStream resource = [RETRIEVE FILE TO PARSE AS INPUT STREAM])
    {
        PdfReader pdfReader = new PdfReader(resource);
    
        for (int page = 1; page <= pdfReader.getNumberOfPages(); page++)
        {
            System.out.printf("\nPage %s\n====\n", page);
    
            PdfReaderContentParser parser = new PdfReaderContentParser(pdfReader);
            parser.processContent(page, extRenderListener);
    
        }
    }
    

    ExtractPaths测试方法testExtractFromTestCreation

    对于您的示例文件,这会产生输出

    Page 1
    ====
    STROKE (0,0,0) the path rectangle [88.3, 693.69, 227.77, 693.69, 227.77, 788.0, 88.3, 788.0] 
    STROKE (0,0,0) the path rectangle [227.77, 693.69, 367.24, 693.69, 367.24, 788.0, 227.77, 788.0] 
    STROKE (0,0,0) the path rectangle [367.23, 693.69, 506.7, 693.69, 506.7, 788.0, 367.23, 788.0] 
    FILL (255,0,0) the path rectangle [229.77, 695.69, 365.37, 695.69, 365.37, 786.09, 229.77, 786.09] 
    STROKE (DEFAULT) the path move to [228.0, 810.0] line to [338.0, 810.0] 
    

    iText将颜色值表示为字节(0-255),而不是PDF使用的单位范围(0.0 - 1.0)。因此,你看到&#39;(255,0,0)&#39; PDF选择&#39; 1 0 0 rg&#39;。