Question

要查找PDF上图像的实际大小，我使用PDFBox，然后按照this SO answer中的描述进行操作。所以基本上我打电话给

 // Computes the image actual location and dimensions
 PrintImageLocations renderer = new PrintImageLocations();

 for (int i = 0; i < pageLimit; ++i) {
        PDPage page = pdf.getPage(i);

        renderer.processPage(page);
 }

，PrintImageLocations（）取自this PDFBox code example。

然而，我用于测试的PDF文档（由找到的on Wikipedia图像中的GPL Ghostscript 910（ps2write）生成），报告的图像大小为0 x 0（尽管PDF可以导入到Gimp中）或自由办公室抽奖）。

所以我想知道我目前使用的代码是否可靠，无法找到图像大小，以及是什么原因导致它无法找到合适的图像大小？

用于此测试的PDF can be found here

==========

编辑：在@Itai评论之后，似乎条件if ("Do".equals(operation))未被评估，因为没有调用此类操作。因此，调用超类中的processOperator。

调用的唯一操作是（我在覆盖System.err.println("Processing " + operation);方法中的条件之前添加了processOperator）：

处理q 处理厘米处理gs 处理q 处理重新处理W 处理处理rg 处理重新处理f 处理cs 处理scn 处理重新处理f 处理问题处理Q

==========

任何提示赞赏，

Answer 1

正如您已经发现的那样，0x0输出的原因是来自PrintImageLocations的代码根本无法找到图像。

PrintImageLocations找不到图像，因为它只查找页面内容中的图像用法以及页面内容中使用的表单XObjects（也是嵌套）。另一方面，在手边的文件中，图像绘制在平铺模式内容中，用于填充页面内容中的某个区域。

为了让PDFBox能够找到这个图像，我们必须稍微扩展PrintImageLocations类，以便下降到模式内容流中，例如：像这样：

class PrintImageLocationsImproved extends PrintImageLocations {
    public PrintImageLocationsImproved() throws IOException {
        super();

        addOperator(new SetNonStrokingColor());
        addOperator(new SetNonStrokingColorN());
        addOperator(new SetNonStrokingDeviceCMYKColor());
        addOperator(new SetNonStrokingDeviceGrayColor());
        addOperator(new SetNonStrokingDeviceRGBColor());
        addOperator(new SetNonStrokingColorSpace());
    }

    @Override
    protected void processOperator(Operator operator, List<COSBase> operands) throws IOException {
        String operation = operator.getName();
        if (fillOperations.contains(operation)) {
            PDColor color = getGraphicsState().getNonStrokingColor();
            PDAbstractPattern pattern = getResources().getPattern(color.getPatternName());
            if (pattern instanceof PDTilingPattern) {
                processTilingPattern((PDTilingPattern) pattern, null, null);
            }
        }
        super.processOperator(operator, operands);
    }

    final List<String> fillOperations = Arrays.asList("f", "F", "f*", "b", "b*", "B", "B*");
}

（ExtractImageLocations内部课程PrintImageLocationsImproved）

手边文档中的平铺图案用作填充图案颜色，而不是抚摸。因此，PrintImageLocationsImproved必须为非描边颜色运算符注册运算符侦听器，以便在图形状态下正确更新填充颜色。

在委派给processOperator实现之前，

PrintImageLocations首先检查运营商是否是 fill 操作。在这种情况下，它会检查当前的填充颜色。如果是图案颜色，processOperator会启动processTilingPattern中定义的PDFStreamEngine处理，它会启动对图案内容流的嵌套分析，最终让PrintImageLocationsImproved找到图像

像这样使用PrintImageLocationsImproved

try (   PDDocument document = PDDocument.load(...)    )
{
    PrintImageLocations printer = new PrintImageLocationsImproved();
    int pageNum = 0;
    for( PDPage page : document.getPages() )
    {
        pageNum++;
        System.out.println( "Processing page: " + pageNum );
        printer.processPage(page);
    }
}

（ExtractImageLocations test testExtractLikeHelloWorldImprovedFromTopSecret）

因此，对于PDF文件，

将找到图像：

Processing page: 1
*******************************************************************
Found image [R8]
position in PDF = 39.0, 102.48 in user space units
raw image size  = 1209, 1640 in pixels
displayed size  = 516.3119, 700.3752 in user space units
displayed size  = 7.1709986, 9.727433 in inches at 72 dpi rendering
displayed size  = 182.14336, 247.0768 in millimeters at 72 dpi rendering

当心，

这不是完美的修复，更多的是概念验证和解决方案，因为它既没有将模式正确地限制到实际填充的区域，也没有为足够大的区域返回多个查找，需要多个模式块填。尽管如此，它会返回手头文件的图像匹配..

为什么PDFBox返回大小为0 x 0的图像尺寸

1 个答案:

当心，