我希望这是一个简单的问题。 我试图使用iTextSharp来修改一些PDF文件,但似乎iTextSharp在文件末尾放置的XMP元数据破坏了PDF文件的布局(而且我不太熟悉PDF格式可以理解为什么)。
您可以从上面的两个图像中看到文档似乎已旋转。然而,从PDF文件看二进制差异来看,唯一不同的是文件末尾的一些XMP元数据
我尝试在多个PDF查看器(Sumatra PDF,Edge Browser和Adobe Acrobat)中打开文件,所有这些都显示出同样的怪异。
我想我有两个问题: a)如何在文件末尾使用XMP meteadata来改变PDF文件? b)如何让iTextSharp不产生这个输出? (iTextSharp似乎只在我添加/编辑内容时执行此操作,而不是如果我只删除Javascript或类似内容)
< EDIT 1>
我用于iTextSharp的代码是来自帖子的PdfContentStreamEditor(逐字):https://stackoverflow.com/a/35915789/2535822
< / EDIT 1>
<编辑2>
好吧..似乎它不是XMP元数据。我通过使用:
pdfStamper.XmpMetadata = new byte[0];
但是文件末尾还有一堆额外的数据
2 0 obj
<</Producer(PDFCreator 2.5.2.5233; modified using iTextSharp’ 5.5.13 ©2000-2018 iText Group NV \(AGPL-version\))/CreationDate(D:20171206173510+10'30')/ModDate(D:20180325144710+11'00')/Title(þÿ
endobj
404 0 obj
<</Length 0/Type/Metadata/Subtype/XML>>stream
endstream
endobj
405 0 obj
<</Length 3638/Filter/FlateDecode>>stream
xœÍZmÅ/6ÒZ2ÁÆ€
....
&lt; / EDIT 2&gt;
答案 0 :(得分:1)
我可以回答你的第二个问题。 您尝试删除的元数据不应被删除。您正在使用的AGPL版本的DLL将添加该元数据,无论您使用代码执行什么操作。您将无法使用iText将其删除,因为它直接违反了许可条款。 请参阅:https://itextpdf.com/AGPL
您必须突出提及iText并包含iText版权和 输出文件元数据中的AGPL许可证。
答案 1 :(得分:1)
You have indeed found a bug in the PdfContentStreamEditor
I used in this answer while the other issue requires one to know how to disable a special feature or quirk (depending on the circumstances) of iText.
This part deals with the rotation of content in the sample document PHA-Pro 8 - File.pdf
provided by the OP.
As you already have seen yourself, the rotation issue appears connected with the fact that the page rotation of the page in question is not 0.
Indeed, the iText PdfStamper
has a feature which in case of rotated pages automatically rotates additions one applies to the OverContent
or UnderContent
. This feature can be quite handy if you want to add upright content to the page without having to apply rotation yourself to make it upright. In case of the PdfContentStreamEditor
, though, all coordinates we receive from the existing content already have the applicable rotation factored in.
Thus, we need to disable this feature. One can do so using the PdfStamper
property RotateContents
:
using (PdfReader pdfReader = new PdfReader(source))
using (PdfStamper pdfStamper = new PdfStamper(pdfReader, new FileStream(dest, FileMode.Create, FileAccess.Write), (char)0, true))
{
pdfStamper.RotateContents = false;
PdfContentStreamEditor editor = new PdfContentStreamEditor();
for (int i = 1; i <= pdfReader.NumberOfPages; i++)
{
editor.EditPage(pdfStamper, i);
}
}
This part deals with the scrambling of text in the sample document AS62061-2006.pdf
provided by the OP.
You have found a bug in the PdfContentStreamEditor
. Its Write
method contains this loop:
foreach (PdfObject pdfObject in operands)
{
pdfObject.ToPdf(canvas.PdfWriter, canvas.InternalBuffer);
canvas.InternalBuffer.Append(operands.Count > ++index ? (byte) ' ' : (byte) '\n');
}
It should instead be
foreach (PdfObject pdfObject in operands)
{
pdfObject.ToPdf(null, canvas.InternalBuffer);
canvas.InternalBuffer.Append(operands.Count > ++index ? (byte) ' ' : (byte) '\n');
}
If one presents the PdfWriter
to the ToPdf
method of a PdfString
and the PdfWriter
uses encryption, the string contents are getting encrypted. But here the string is written to a stream, and in that case not the individual string must be encrypted but instead eventually the whole stream.
This applies to the PDF provided by the OP because
PdfStamper
in append mode which encrypts the additions using the same password as the original file.With the original code, the result looks like this:
With the fixed code, it looks like this: