Question

我正在维护一个使用iText 2.1.7创建PDF的Web应用程序。我想获取现有PDF的内容并将其放入代码处于创建过程中的pdf文档中。我有以下内容（编辑：更完整的代码）：

package itexttest;

import com.lowagie.text.Document;
import com.lowagie.text.PageSize;
import com.lowagie.text.Paragraph;
import com.lowagie.text.pdf.PdfCopy;
import com.lowagie.text.pdf.PdfImportedPage;
import com.lowagie.text.pdf.PdfReader;
import com.lowagie.text.pdf.PdfWriter;
import java.io.ByteArrayOutputStream;
import java.io.OutputStream;

public class ITextTest 
{
    public static void main(String[] args) 
    {
        try
        {
            ByteArrayOutputStream os = new ByteArrayOutputStream();
            Document bigDoc = new Document(PageSize.LETTER, 50, 50, 110, 60);
            PdfWriter writer = PdfWriter.getInstance(bigDoc, os);
            bigDoc.open();

            Paragraph par = new Paragraph("one");
            bigDoc.add(par);
            bigDoc.add(new Paragraph("three"));

            addPdfPage(bigDoc, os, "c:/insertable.pdf");

            bigDoc.close();
        }
        catch (Exception e)
        {
            e.printStackTrace();
        }
    }

    private static void addPdfPage(Document document, OutputStream outputStream, String location) {
        try {

            PdfReader pdfReader = new PdfReader(location);
            int pages = pdfReader.getNumberOfPages();

            PdfCopy pdfCopy = new PdfCopy(document, outputStream);
            PdfImportedPage page = pdfCopy.getImportedPage(pdfReader, 1);
            pdfCopy.addPage(page);
        }
        catch (Exception e) {
            System.out.println("Cannot add PDF from PSC: <" + location + ">: " + e.getMessage());
            e.printStackTrace();
        }
    }

}

这会引发错误，从PdfWriter.getPageReference()开始为空。

我如何错误地使用它？如何从现有文档中获取页面并将其放在当前文档中？请注意，我不在一个可以方便地将文件写入临时存储或其他任何地方的地方。

Answer 1

我不再积极使用旧的iText版本，但从那以后有些事情没有改变。因此，您的代码和指针中的一些问题有助于解决它们：

您当前代码中的主要问题是您

重新使用Document实例（您已经用于PdfWriter并且已经打开过）PdfCopy;虽然Document可以支持多个侦听器，但在调用open之前，它们都需要注册;这个结构的用例是以两种不同的格式并行创建同一个文档;和你
为PdfWriter和PdfCopy使用相同的输出流;结果不是一个有效的PDF，而是字节范围从两个不同的PDF混合在一起，即肯定不会成为有效PDF的东西。

正确使用`PdfCopy`

您可以首先在ByteArrayOutputStream中创建包含新段落的新PDF（关闭所涉及的Document），然后将此PDF和您要添加的其他页面复制到新PDF。

E.g。像这样：

ByteArrayOutputStream os = new ByteArrayOutputStream();
Document bigDoc = new Document(PageSize.LETTER, 50, 50, 110, 60);
PdfWriter writer = PdfWriter.getInstance(bigDoc, os);
bigDoc.open();
Paragraph par = new Paragraph("one");
bigDoc.add(par);
bigDoc.add(new Paragraph("three"));
bigDoc.close();

ByteArrayOutputStream os2 = new ByteArrayOutputStream();
Document finalDoc = new Document();
PdfCopy copy = new PdfCopy(finalDoc, new FileOutputStream(RESULT2));
finalDoc.open();
PdfReader reader = new PdfReader(os.toByteArray());
for (int i = 0; i < reader.getNumberOfPages();) {
    copy.addPage(copy.getImportedPage(reader, ++i));
}
PdfReader pdfReader = new PdfReader("c:/insertable.pdf");
copy.addPage(copy.getImportedPage(pdfReader, 1));
finalDoc.close();
reader.close();
pdfReader.close();

// result PDF
byte[] result = os2.toByteArray();

仅使用`PdfWriter`

您也可以通过直接将页面导入PdfWriter来更改代码，例如：像这样：

ByteArrayOutputStream os = new ByteArrayOutputStream();
Document bigDoc = new Document(PageSize.LETTER, 50, 50, 110, 60);
PdfWriter writer = PdfWriter.getInstance(bigDoc, os);
bigDoc.open();
Paragraph par = new Paragraph("one");
bigDoc.add(par);
bigDoc.add(new Paragraph("three"));

PdfReader pdfReader = new PdfReader("c:/insertable.pdf");
PdfImportedPage page = writer.getImportedPage(pdfReader, 1);
bigDoc.newPage();
PdfContentByte canvas = writer.getDirectContent();
canvas.addTemplate(page, 1, 0, 0, 1, 0, 0);

bigDoc.close();
pdfReader.close();

// result PDF
byte[] result = os.toByteArray();

此方法似乎更好，因为不需要中间PDF。不幸的是，外观是欺骗，这种做法有些缺点。

此处不是整个原始页面被复制并按原样添加到文档中，而只是其内容流被用作模板的内容，然后被引用来自实际的新文档页面。这特别意味着：

如果导入的页面的尺寸与新目标文档的尺寸不同，则可能会剪切部分内容，而新页面的某些部分仍为空。因此，您经常会找到上面代码的变体，通过缩放和旋转尝试使导入的页面和目标页面适合。
原始页面内容现在位于从新页面引用的模板中。如果使用相同的机制将此新页面导入另一个文档，则会获得一个引用模板的页面，该模板仅引用具有原始内容的模板。如果将此页面导入另一个文档，则会获得另一层次的间接性。等等。

不幸的是，符合PDF的查看者只需要在有限的程度上支持这种间接性。如果您继续此过程，您的页面内容可能会突然显示。如果原始页面已经带来了自己的引用模板层次结构，那么这可能会很快发生。
由于仅复制内容，因此不会丢失不在内容流中的原始页面的属性。这尤其涉及注释，如表单字段或某些类型的高亮标记，甚至某些类型的自由文本。

（顺便说一句，通用PDF规范术语中的这些 templates 称为 Form XObjects 。）

This answer明确处理在合并PDF的情况下使用PdfCopy和PdfWriter。

Answer 2

这是另一个版本，包含了mkl的更正，希望这些名称可以用于其他问题：

xd.Load("http://localhost/login/MyService.asmx/GetSearchData?Search="test"

如果在程序的默认目录中使用文件'insertable.pdf'运行，则该程序在同一目录中生成文件'inserted.pdf'，文本行“one”，“two”和“第一页上有三个，第二页上有'insertable.pdf'的第一页。

所以mkl的修正工作;要在我想要使用它的环境中使用它，有几个问题：

我有一个程序，我想使用这个功能，这是一个Web应用程序，所以没有现成的权限来写一个文件的地方。我假设我可以使用ByteArrayOutputStream代替输出文件。

是否必须创建新的输出流才能插入内容？我希望有一种方法可以告诉一些iText组件“这是一个文件;读取它的第一页并将其插入我已经打开的文件/ outputStream / writer中。内存中的Pdf文件可能会变得非常大;我宁愿不必复制所有现有的PDF结构，以便我可以添加另一个。如果我最终从多个其他文档插入页面，我可能必须多次这样做，我猜...

iText 2.1.7 PdfCopy.addPage（页面）无法找到页面引用？

2 个答案:

正确使用`PdfCopy`

仅使用`PdfWriter`

iText 2.1.7 PdfCopy.addPage（页面）无法找到页面引用？

2 个答案:

正确使用PdfCopy

仅使用PdfWriter

正确使用`PdfCopy`

仅使用`PdfWriter`