减少您的页面对象的资源

Question

我使用this technique将acroform从另一个源pdf导出到新的pdf文件。

只有acroform的结果pdf download here

我使用pdfcompressor在线网站压缩了该pdf（59Ko），并将其减少了-64％。这个网站似乎清除了Resources中所有未使用的东西，这是PDFDebugger的屏幕截图

我的问题是，如何从Resources []中获取xobject或字体，如果不从Resources []中删除它们，请检查它们是否在页面的某处使用。

如果是否使用某些资源在PDPage中进行搜索很复杂，我如何简单地从Resources []中删除XObject或Font？

虽然在页面中搜索使用过的xobject超出了我的范围，但我只是尝试直接删除COSObject，但它不起作用^^：

        for (PDPage page : document.getPages()) {

            PDResources resources = page.getResources();

            // all xobject form resources
            for (COSName name : resources.getXObjectNames()) {
                page.getCOSObject().removeItem(name); // NOT WORKS
            }

            // all font resources from pages
            for (COSName name : resources.getFontNames()) {
                if (resources.getFont(name) instanceof PDFont) {
                    page.getCOSObject().removeItem(name); // NOT WORKS
                }

            }
        }

ps：在@mkl建议discussed here之后创建的问题

update1

这是我要从pdf中提取acroform的当前代码： //从原始文件创建

PDDocument documentSrc = PDDocument.load(new File("original.pdf"));;
PDAcroForm acroFormSrc = documentSrc.getDocumentCatalog().getAcroForm();

PDDocument documentDest = new PDDocument();
for (PDPage page : documentSrc.getPages()) {
    PDPage destPage  = new PDPage(PDRectangle.A4);
    destPage.setMediaBox(page.getMediaBox());
    destPage.setCropBox(page.getCropBox());
    documentDest.addPage(destPage);
}

PDAcroForm acroFormDest = new PDAcroForm(documentDest);


acroFormDest.setCacheFields(true);
acroFormDest.setFields(acroFormSrc.getFields());
documentDest.getDocumentCatalog().setAcroForm(acroFormDest);

int pageIndex = 0;
for (PDPage page : documentSrc.getPages()) {
    documentDest.getPage(pageIndex).setAnnotations(page.getAnnotations());
    // after disabling this size increase
    //documentDest.getPage(pageIndex).setResources(page.getResources());
    pageIndex++;
}

acroFormDest.setDefaultAppearance(acroFormSrc.getDefaultAppearance());
acroFormDest.setDefaultResources(acroFormSrc.getDefaultResources());
acroFormDest.setQ(acroFormSrc.getQ());

// this is disabled because setResources is disabled above
//removeLinksInPages(documentDest);
//removeTextInDocument(documentDest);

此结果：pdf without resources

这次没有资源的表格是73Ko，而我的原始pdf是75Ko。

Answer 1

减少您的页面对象的资源

嗯，我认为您当前的任务比您在问题中要简单得多。我会解释你的

我使用this technique将acroform从另一个源pdf导出到新的pdf文件。

暗示您确实只想将 AcroForm 字段和功能从一个PDF传输到另一PDF，而对原始文件的静态页面内容不感兴趣。

因此，您实际上使用哪个页面资源的问题的答案很简单：无！页面资源是您不感兴趣的静态内容（页面内容流中）中使用的资源。

因此，无需首先将页面资源复制到新文档中，只需删除该行

documentDest.getPage(pageIndex).setResources(page.getResources());

从引用的答案中的代码开始。

顺便说一句：@Tilman已经在对您用作模板的答案的注释中指出，感兴趣的资源是“ acroform的默认资源”，而不是页面资源。因此，您可能不仅要在PDAcroForm实例之间复制字段：

acroFormDest.setFields(acroFormSrc.getFields());

还有默认资源，默认外观和默认四边形

acroFormDest.setDefaultAppearance(acroFormSrc.getDefaultAppearance());
acroFormDest.setDefaultResources(acroFormSrc.getDefaultResources());
acroFormDest.setQ(acroFormSrc.getQ());

其他问题

注释返回错误的页面

这次没有资源的表格是73Ko，而我的原始pdf是75Ko。

更深入地研究“ form-without-resources.pdf”，问题变得很明显：

如您所见，您将小部件注释字段指向错误的页面！

将该 P 值指定为

P   字典   （除非另有说明，否则为可选； PDF 1.3；在FDF文件中不使用）对与此注释相关联的页面对象的间接引用。

（ISO 32000-1，表164 –所有注释词典共有的条目）

因此，您将目标页面的注释设置为源页面的注释，但是其 P 值中的那些注释仍引用源页面。因此，您可以通过此参考将源页面及其所有资源拖到新文档中。因此，结果文件并不比源文件小一点也就不足为奇了。

如果您更改代码以更正 P 参考，例如像这样：

int pageIndex = 0;
for (PDPage page : documentSrc.getPages()) {
    PDPage destPage = documentDest.getPage(pageIndex);
    destPage.setAnnotations(page.getAnnotations());
    for (PDAnnotation annotation : destPage.getAnnotations())
        annotation.setPage(destPage);
    // after disabling this size increase
    //documentDest.getPage(pageIndex).setResources(page.getResources());
    pageIndex++;
}

（CopyForm测试testCopyLikeBeeImproved）

您将松开对旧数据的引用。

通过页面搜索使用的资源并将其删除

update1

1 个答案:

减少您的页面对象的资源

其他问题

注释返回错误的页面