我从How to redact a large rectangle of a PDF by iTextSharp?
中提取了代码并生成:
iTextSharp.text.pdf.PdfReader reader;
reader = new iTextSharp.text.pdf.PdfReader(new System.IO.FileStream(txtPDFFile.Text, System.IO.FileMode.Open));
string path = System.IO.Path.GetDirectoryName(txtPDFFile.Text);
System.IO.Stream fsOut = new System.IO.FileStream(System.IO.Path.Combine(path,"redacted.pdf"), System.IO.FileMode.OpenOrCreate);
iTextSharp.text.pdf.PdfStamper stamper = new iTextSharp.text.pdf.PdfStamper(reader, fsOut);
List<iTextSharp.xtra.iTextSharp.text.pdf.pdfcleanup.PdfCleanUpLocation> cleanUpLocations = new List<iTextSharp.xtra.iTextSharp.text.pdf.pdfcleanup.PdfCleanUpLocation>();
cleanUpLocations.Add(new iTextSharp.xtra.iTextSharp.text.pdf.pdfcleanup.PdfCleanUpLocation(1, new iTextSharp.text.Rectangle(77f, 77f, 200f, 200f), iTextSharp.text.BaseColor.GRAY));
iTextSharp.xtra.iTextSharp.text.pdf.pdfcleanup.PdfCleanUpProcessor cleaner = new iTextSharp.xtra.iTextSharp.text.pdf.pdfcleanup.PdfCleanUpProcessor(cleanUpLocations, stamper);
cleaner.CleanUp();
stamper.Close();
reader.Close();
所以我从链接的文章中选择了我应该使用的不同输入文件。
但是在cleaner.CleanUp()中,我得到了一个未找到的对象引用:
at iTextSharp.xtra.iTextSharp.text.pdf.pdfcleanup.PdfCleanUpContentOperator.Invoke(PdfContentStreamProcessor pdfContentStreamProcessor, PdfLiteral oper, List`1 operands)
at iTextSharp.text.pdf.parser.PdfContentStreamProcessor.InvokeOperator(PdfLiteral oper, List`1 operands)
at iTextSharp.text.pdf.parser.PdfContentStreamProcessor.ProcessContent(Byte[] contentBytes, PdfDictionary resources)
at iTextSharp.text.pdf.parser.PdfContentStreamProcessor.FormXObjectDoHandler.HandleXObject(PdfContentStreamProcessor processor, PdfStream stream, PdfIndirectReference refi)
at iTextSharp.text.pdf.parser.PdfContentStreamProcessor.DisplayXObject(PdfName xobjectName)
at iTextSharp.text.pdf.parser.PdfContentStreamProcessor.Do.Invoke(PdfContentStreamProcessor processor, PdfLiteral oper, List`1 operands)
at iTextSharp.xtra.iTextSharp.text.pdf.pdfcleanup.PdfCleanUpContentOperator.Invoke(PdfContentStreamProcessor pdfContentStreamProcessor, PdfLiteral oper, List`1 operands)
at iTextSharp.text.pdf.parser.PdfContentStreamProcessor.InvokeOperator(PdfLiteral oper, List`1 operands)
at iTextSharp.text.pdf.parser.PdfContentStreamProcessor.ProcessContent(Byte[] contentBytes, PdfDictionary resources)
at iTextSharp.xtra.iTextSharp.text.pdf.pdfcleanup.PdfCleanUpProcessor.CleanUpPage(Int32 pageNum, IList`1 cleanUpLocations)
at iTextSharp.xtra.iTextSharp.text.pdf.pdfcleanup.PdfCleanUpProcessor.CleanUp()
at Com.EDS.DocSol.PDFExtract.PDFExtractForm.btnRedaction_Click(Object sender, EventArgs e) in D:\Users\me\Code\PDFExtract\PDFExtract\PDFExtractForm.cs:line 106
at System.Windows.Forms.Control.OnClick(EventArgs e)
at System.Windows.Forms.Button.OnClick(EventArgs e)
at System.Windows.Forms.Button.OnMouseUp(MouseEventArgs mevent)
at System.Windows.Forms.Control.WmMouseUp(Message& m, MouseButtons button, Int32 clicks)
at System.Windows.Forms.Control.WndProc(Message& m)
at System.Windows.Forms.ButtonBase.WndProc(Message& m)
at System.Windows.Forms.Button.WndProc(Message& m)
at System.Windows.Forms.Control.ControlNativeWindow.OnMessage(Message& m)
at System.Windows.Forms.Control.ControlNativeWindow.WndProc(Message& m)
at System.Windows.Forms.NativeWindow.DebuggableCallback(IntPtr hWnd, Int32 msg, IntPtr wparam, IntPtr lparam)
at System.Windows.Forms.UnsafeNativeMethods.DispatchMessageW(MSG& msg)
at System.Windows.Forms.Application.ComponentManager.System.Windows.Forms.UnsafeNativeMethods.IMsoComponentManager.FPushMessageLoop(Int32 dwComponentID, Int32 reason, Int32 pvLoopData)
at System.Windows.Forms.Application.ThreadContext.RunMessageLoopInner(Int32 reason, ApplicationContext context)
at System.Windows.Forms.Application.ThreadContext.RunMessageLoop(Int32 reason, ApplicationContext context)
at System.Windows.Forms.Application.Run(Form mainForm)
at Com.EDS.DocSol.PDFExtract.Program.Main(String[] args) in D:\Users\me\Code\PDFExtract\PDFExtract\Program.cs:line 140
at System.AppDomain._nExecuteAssembly(Assembly assembly, String[] args)
at System.AppDomain.ExecuteAssembly(String assemblyFile, Evidence assemblySecurity, String[] args)
at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.ThreadHelper.ThreadStart()
我不明白为什么。矩形我没有改变。我不确定那个地方是否真的有东西。我有一些代码首先添加注释,然后我试图应用它。但它也会得到相同的对象引用错误。
在上面的代码中......我是否需要在应用之前先创建一个编辑注释,或者这段代码选择我想要编辑的框并在一次通过中应用它。
我想要的矩形(它是一个地址块),实际上是:iTextSharp.text.Rectangle(45,650,200,750);
答案 0 :(得分:0)
关于OP的原始观察,未找到对象引用
at iTextSharp.xtra.iTextSharp.text.pdf.pdfcleanup.PdfCleanUpContentOperator.Invoke(PdfContentStreamProcessor pdfContentStreamProcessor, PdfLiteral oper, List`1 operands)
at iTextSharp.text.pdf.parser.PdfContentStreamProcessor.InvokeOperator(PdfLiteral oper, List`1 operands)
at iTextSharp.text.pdf.parser.PdfContentStreamProcessor.ProcessContent(Byte[] contentBytes, PdfDictionary resources)
at iTextSharp.text.pdf.parser.PdfContentStreamProcessor.FormXObjectDoHandler.HandleXObject(PdfContentStreamProcessor processor, PdfStream stream, PdfIndirectReference refi)
at iTextSharp.text.pdf.parser.PdfContentStreamProcessor.DisplayXObject(PdfName xobjectName)
at iTextSharp.text.pdf.parser.PdfContentStreamProcessor.Do.Invoke(PdfContentStreamProcessor processor, PdfLiteral oper, List`1 operands)
at iTextSharp.xtra.iTextSharp.text.pdf.pdfcleanup.PdfCleanUpContentOperator.Invoke(PdfContentStreamProcessor pdfContentStreamProcessor, PdfLiteral oper, List`1 operands)
at iTextSharp.text.pdf.parser.PdfContentStreamProcessor.InvokeOperator(PdfLiteral oper, List`1 operands)
at iTextSharp.text.pdf.parser.PdfContentStreamProcessor.ProcessContent(Byte[] contentBytes, PdfDictionary resources)
清理处理器解析表单xobject的内容流时发生:内容流中的某些指令似乎无效(很可能指令参数无效),即PDF很可能只是简单地打破了。< / p>
使用OP文档的 desensitised 版本无法重现此行为。特别是,desensitized版本在每个页面上只包含一个表单xobject,它不与编校区域相交。当将编校区域扩展为与表单xobject部分相交时,会出现异常。但它是一个不同的,清楚地表明System.Drawing.Graphics.FromImage
无法处理xobject形式中显示的位图图像的格式。
因此,在脱敏过程中似乎已删除了无效的表单xobject内容。因此,为了解决手头问题的清理代码,需要原始文档。
在评论中,OP表示他也尝试以不同的方式调用清理过程,即通过向PDF添加编辑注释,然后在没有PdfCleanUpLocation
的情况下调用清理过程。他添加了像这样的编辑注释:
PdfReader reader = new PdfReader(new FileStream(txtPDFFile.Text, FileMode.Open));
using (PdfStamper stamper = new PdfStamper(reader, new FileStream(txtPDFFile.Text + ".pdf", FileMode.OpenOrCreate)))
{
// Add the annotations
int page = 1;
Rectangle rect = new Rectangle(45, 650, 200, 750);
PdfAnnotation annotation = new PdfAnnotation(stamper.Writer, rect);
annotation.Put(PdfName.SUBTYPE, new PdfName("Redact"));
stamper.AddAnnotation(annotation, page);
} //Using
清理现在也会遇到对象引用未找到的情况,但这一次
at iTextSharp.xtra.iTextSharp.text.pdf.pdfcleanup.PdfCleanUpProcessor.ExtractLocationsFromRedactAnnot(Int32 page, Int32 annotIndex, PdfDictionary annotDict)
at iTextSharp.xtra.iTextSharp.text.pdf.pdfcleanup.PdfCleanUpProcessor.ExtractLocationsFromRedactAnnots(Int32 page, PdfDictionary pageDict)
at iTextSharp.xtra.iTextSharp.text.pdf.pdfcleanup.PdfCleanUpProcessor.ExtractLocationsFromRedactAnnots()
这种情况下的原因是清理代码中的错误。由例如生成的编辑注释Adobe Reader通常包含一个附加参数 QuadPoints ,其中包含许多四边形,这些四边形指定注释矩形内部实际编辑的区域;如果此参数不存在,则整个矩形将被编辑。
此上下文中的iTextSharp具有以下代码:
PdfArray quadPoints = annotDict.GetAsArray(PdfName.QUADPOINTS);
if (quadPoints.Size != 0) {
markedRectangles.AddRange(TranslateQuadPointsToRectangles(quadPoints));
} else {
... add a range for the annotation rectangle ...
}
不幸的是,如果注释没有 QuadPoints ,annotDict.GetAsArray
会返回null
并且quadPoints.Size
的评估会因异常而失败。它应该是
if (quadPoints != null && quadPoints.Size != 0) {
代替。
OP可以解决这个问题,方法是将 QuadPoints 条目与空数组一起添加到他的编辑中:
...
annotation.Put(PdfName.SUBTYPE, new PdfName("Redact"));
annotation.Put(PdfName.QUADPOINTS, new PdfArray()); // <<<<<<<<
stamper.AddAnnotation(annotation, page);
...
注意:这仅仅是针对此iTextSharp问题的解决方法,如果带有注释的PDF用于其他用途,则不应该执行此操作。严格来说,空 QuadPoints 条目表示没有任何内容需要编辑。
顺便说一下,OP的代码中存在一个问题:当为PdfStamper创建要写入的文件流时,他使用FileMode.OpenOrCreate
:
System.IO.Stream fsOut = new System.IO.FileStream(System.IO.Path.Combine(path,"redacted.pdf"), System.IO.FileMode.OpenOrCreate);
或
using (PdfStamper stamper = new PdfStamper(reader, new FileStream(txtPDFFile.Text + ".pdf", FileMode.OpenOrCreate)))
如果已存在具有该名称的文件,该文件的长度比新PDF长,则结果将具有旧文件的大小,旧文件仍在额外空间中。即新文件有效地具有悬空垃圾内容,这会产生无效的PDF,例如Adobe Reader提供修复。
一般来说,应该使用FileMode.Create
代替。从其文档:
FileMode.Create
相当于请求如果文件不存在,请使用System.IO.FileMode.CreateNew
;否则,请使用System.IO.FileMode.Truncate
。