Question

我正在尝试遍历PDF上的每个页面以查找特定关键字。代码在其他PDF上运行正常，但one

除外

我的代码

Using oReader As New pdf.PdfReader(pdfFilename)

    For pCurrent = oReader.NumberOfPages To 1 Step -1
        Dim strategy As pdf.parser.ITextExtractionStrategy = New pdf.parser.SimpleTextExtractionStrategy()
        Dim pageText As String = pdf.parser.PdfTextExtractor.GetTextFromPage(oReader, pCurrent, strategy)

        '
        'search for keywords
        '
        'FindVOI

    Next 'proceed next page

End Using

以下是导致此异常的代码片段，

Dim pageText As String = pdf.parser.PdfTextExtractor.GetTextFromPage(oReader, pCurrent, strategy)

在PDF的第98页上显示异常Stack empty，有什么想法是错的吗？

完全例外：

Exception thrown: 'System.InvalidOperationException' in System.dll
System.Transactions Critical: 0 : <TraceRecord xmlns="http://schemas.microsoft.com/2004/10/E2ETraceEvent/TraceRecord" Severity="Critical"><TraceIdentifier>http://msdn.microsoft.com/TraceCodes/System/ActivityTracing/2004/07/Reliability/Exception/Unhandled</TraceIdentifier><Description>Unhandled exception</Description><AppDomain>VipMonitorService.vshost.exe</AppDomain><Exception><ExceptionType>System.InvalidOperationException, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089</ExceptionType><Message>Stack empty.</Message><StackTrace>   at System.ThrowHelper.ThrowInvalidOperationException(ExceptionResource resource)
   at System.Collections.Generic.Stack`1.Pop()
   at iTextSharp.text.pdf.parser.PdfContentStreamProcessor.EndMarkedContentC.Invoke(PdfContentStreamProcessor processor, PdfLiteral oper, List`1 operands)
   at iTextSharp.text.pdf.parser.PdfContentStreamProcessor.InvokeOperator(PdfLiteral oper, List`1 operands)
   at iTextSharp.text.pdf.parser.PdfContentStreamProcessor.ProcessContent(Byte[] contentBytes, PdfDictionary resources)
   at iTextSharp.text.pdf.parser.PdfReaderContentParser.ProcessContent[E](Int32 pageNumber, E renderListener, IDictionary`2 additionalContentOperators)
   at iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage(PdfReader reader, Int32 pageNumber, ITextExtractionStrategy strategy)
   at WatcherApp.VipMonitorService.PDFHelper.FindVOI(List`1 voiList, String pdfFilename, Boolean searchFromLast, Int32 searchNumberOfPagesInPercent) in \\Mac\Dropbox\git\Personal\WatcherApp\VipMonitorService\PDFHelper.vb:line 59
   at WatcherApp.VipMonitorService.Controller.ProcessAnnualReport(Announcement a) in \\Mac\Dropbox\git\Personal\WatcherApp\VipMonitorService\Controller.vb:line 456
   at WatcherApp.VipMonitorService.Controller.ProcessARInQueueThread() in \\Mac\Dropbox\git\Personal\WatcherApp\VipMonitorService\Controller.vb:line 362
   at WatcherApp.VipMonitorService.Controller._Lambda$__40-0() in \\Mac\Dropbox\git\Personal\WatcherApp\VipMonitorService\Controller.vb:line 339
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.ThreadHelper.ThreadStart()</StackTrace><ExceptionString>System.InvalidOperationException: Stack empty.
   at System.ThrowHelper.ThrowInvalidOperationException(ExceptionResource resource)
   at System.Collections.Generic.Stack`1.Pop()
   at iTextSharp.text.pdf.parser.PdfContentStreamProcessor.EndMarkedContentC.Invoke(PdfContentStreamProcessor processor, PdfLiteral oper, List`1 operands)
   at iTextSharp.text.pdf.parser.PdfContentStreamProcessor.InvokeOperator(PdfLiteral oper, List`1 operands)
   at iTextSharp.text.pdf.parser.PdfContentStreamProcessor.ProcessContent(Byte[] contentBytes, PdfDictionary resources)
   at iTextSharp.text.pdf.parser.PdfReaderContentParser.ProcessContent[E](Int32 pageNumber, E renderListener, IDictionary`2 additionalContentOperators)
   at iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage(PdfReader reader, Int32 pageNumber, ITextExtractionStrategy strategy)
   at WatcherApp.VipMonitorService.PDFHelper.FindVOI(List`1 voiList, String pdfFilename, Boolean searchFromLast, Int32 searchNumberOfPagesInPercent) in \\Mac\Dropbox\git\Personal\WatcherApp\VipMonitorService\PDFHelper.vb:line 59
   at WatcherApp.VipMonitorService.Controller.ProcessAnnualReport(Announcement a) in \\Mac\Dropbox\git\Personal\WatcherApp\VipMonitorService\Controller.vb:line 456
   at WatcherApp.VipMonitorService.Controller.ProcessARInQueueThread() in \\Mac\Dropbox\git\Personal\WatcherApp\VipMonitorService\Controller.vb:line 362
   at WatcherApp.VipMonitorService.Controller._Lambda$__40-0() in \\Mac\Dropbox\git\Personal\WatcherApp\VipMonitorService\Controller.vb:line 339
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.ThreadHelper.ThreadStart()</ExceptionString></Exception></TraceRecord>

Answer 1

在PDF的第98页上显示异常堆栈为空，任何想法有什么问题？

堆栈跟踪显示 Stack empty 出现在iTextSharp.text.pdf.parser.PdfContentStreamProcessor.EndMarkedContentC.Invoke。因此，我们应该查看起始和结束标记内容运算符：

标签   的 BMC   开始由平衡 EMC 运算符终止的标记内容序列。标记应该是一个名称对象，表示序列的作用或重要性。

标记属性   的 BDC   使用关联的属性列表开始标记内容序列，并通过平衡 EMC 运算符终止。标记应该是一个名称对象，表示序列的作用或重要性。 properties 应该是包含属性列表的内联字典，或者是当前资源字典的 Properties 子字典中与之关联的名称对象（请参阅14.6.2，“属性列表”） “）。

-   的 EMC   结束由 BMC 或 BDC 运算符开始的标记内容序列。

（表320 - 标记内容运算符，ISO 32000-1）

如果您查看相关网页上的 BDC / BMC 和 EMC 标记内容的开头和结尾，您会看到：

/Artifact <</O /Layout >>BDC EMC /Artifact <</O /Layout >>BDC EMC /Artifact <</O /Layout >>BDC EMC /Artifact <</BBox [0 33.8887 407.4289 0 ]/O /Layout >>BDC EMC EMC ...

因此，有一个剩余的 EMC 运算符，其中没有 BMC 或 BDC 运算符来结束标记的内容。

因此，本文件不是有效的PDF;特别是其标记的内容结构被打破了。

话虽如此，如果iTextSharp在Pop之前检查堆栈并且可选择抛出更明确的异常或忽略 EMC 运算符，那将是合适的。

从PDF页面获取文本时，iTextSharp异常“Stack empty”

1 个答案: