PdfContentStreamEditor在PDF文件上旋转图像

时间:2018-03-25 03:23:47

标签: c# pdf itext pdf-generation pdfstamper

我希望这是一个简单的问题。 我试图使用iTextSharp来修改一些PDF文件,但似乎iTextSharp在文件末尾放置的XMP元数据破坏了PDF文件的布局(而且我不太熟悉PDF格式可以理解为什么)。

Here's a small section of the original document And the same section from the 'edited' document 您可以从上面的两个图像中看到文档似乎已旋转。然而,从PDF文件看二进制差异来看,唯一不同的是文件末尾的一些XMP元数据

DIFF of files showing XMP metadata at end as only difference

我尝试在多个PDF查看器(Sumatra PDF,Edge Browser和Adobe Acrobat)中打开文件,所有这些都显示出同样的怪异。

我想我有两个问题: a)如何在文件末尾使用XMP meteadata来改变PDF文件? b)如何让iTextSharp不产生这个输出? (iTextSharp似乎只在我添加/编辑内容时执行此操作,而不是如果我只删除Javascript或类似内容)

< EDIT 1>
我用于iTextSharp的代码是来自帖子的PdfContentStreamEditor(逐字):https://stackoverflow.com/a/35915789/2535822
< / EDIT 1>
<编辑2>
好吧..似乎它不是XMP元数据。我通过使用:

摆脱了这一点
pdfStamper.XmpMetadata = new byte[0];

但是文件末尾还有一堆额外的数据

2 0 obj
<</Producer(PDFCreator 2.5.2.5233; modified using iTextSharp’ 5.5.13 ©2000-2018 iText Group NV \(AGPL-version\))/CreationDate(D:20171206173510+10'30')/ModDate(D:20180325144710+11'00')/Title(þÿ
endobj
404 0 obj
<</Length 0/Type/Metadata/Subtype/XML>>stream

endstream
endobj
405 0 obj
<</Length 3638/Filter/FlateDecode>>stream
xœÍZmÅ/6ÒZ2ÁÆ€
....

&lt; / EDIT 2&gt;

2 个答案:

答案 0 :(得分:1)

我可以回答你的第二个问题。 您尝试删除的元数据不应被删除。您正在使用的AGPL版本的DLL将添加该元数据,无论您使用代码执行什么操作。您将无法使用iText将其删除,因为它直接违反了许可条款。 请参阅:https://itextpdf.com/AGPL

  

您必须突出提及iText并包含iText版权和   输出文件元数据中的AGPL许可证。

答案 1 :(得分:1)

You have indeed found a bug in the PdfContentStreamEditor I used in this answer while the other issue requires one to know how to disable a special feature or quirk (depending on the circumstances) of iText.

Rotation of the content

This part deals with the rotation of content in the sample document PHA-Pro 8 - File.pdf provided by the OP.

As you already have seen yourself, the rotation issue appears connected with the fact that the page rotation of the page in question is not 0.

Indeed, the iText PdfStamper has a feature which in case of rotated pages automatically rotates additions one applies to the OverContent or UnderContent. This feature can be quite handy if you want to add upright content to the page without having to apply rotation yourself to make it upright. In case of the PdfContentStreamEditor, though, all coordinates we receive from the existing content already have the applicable rotation factored in.

Thus, we need to disable this feature. One can do so using the PdfStamper property RotateContents:

using (PdfReader pdfReader = new PdfReader(source))
using (PdfStamper pdfStamper = new PdfStamper(pdfReader, new FileStream(dest, FileMode.Create, FileAccess.Write), (char)0, true))
{
    pdfStamper.RotateContents = false;
    PdfContentStreamEditor editor = new PdfContentStreamEditor();

    for (int i = 1; i <= pdfReader.NumberOfPages; i++)
    {
        editor.EditPage(pdfStamper, i);
    }
}

Scrambling of text

This part deals with the scrambling of text in the sample document AS62061-2006.pdf provided by the OP.

You have found a bug in the PdfContentStreamEditor. Its Write method contains this loop:

foreach (PdfObject pdfObject in operands)
{
    pdfObject.ToPdf(canvas.PdfWriter, canvas.InternalBuffer);
    canvas.InternalBuffer.Append(operands.Count > ++index ? (byte) ' ' : (byte) '\n');
}

It should instead be

foreach (PdfObject pdfObject in operands)
{
    pdfObject.ToPdf(null, canvas.InternalBuffer);
    canvas.InternalBuffer.Append(operands.Count > ++index ? (byte) ' ' : (byte) '\n');
}

If one presents the PdfWriter to the ToPdf method of a PdfString and the PdfWriter uses encryption, the string contents are getting encrypted. But here the string is written to a stream, and in that case not the individual string must be encrypted but instead eventually the whole stream.

This applies to the PDF provided by the OP because

  • the PDF is encrypted using the default password and
  • the OP edited using a PdfStamper in append mode which encrypts the additions using the same password as the original file.

With the original code, the result looks like this:

broken page content

With the fixed code, it looks like this:

proper page content