Question

我有一个Windows服务，它使用iTextSharp版本5.5.11从数据库中读取PDF，提取文本，然后将该文本上传回数据库以便于搜索。

它运作得很好。但现在，似乎它打开的每个PDF都会引发同样的异常：

iTextSharp.text.exceptions.InvalidPdfException: Rebuild failed: Dictionary key R is not a name. at file pointer 9195; Original message: Dictionary key R is not a name. at file pointer 9195 
at iTextSharp.text.pdf.PdfReader..ctor(IRandomAccessSource byteSource, Boolean partialRead, Byte[] ownerPassword, X509Certificate certificate, ICipherParameters certificateKey, Boolean closeSourceOnConstructorError) 
at iTextSharp.text.pdf.PdfReader..ctor(Byte[] pdfIn)

这些PDF来自不同的来源，在不同的时间以不同的方式上传 - 但不能/不会通过iTextSharp打开。

同样的过程也适用于许多其他PDF。虽然它现在似乎发生在大多数PDF文件中，但它们并不是100％，因为有些文件仍然没有超过这段代码。

我的代码中是否有导致此问题的内容？或者所有这些看似随机的PDF实际上都有完全相同的问题？或者是否有其他原因导致这种情况？任何帮助表示赞赏！

编辑1： 这是带有例外的代码;最后一行是发生异常的地方：

    public string ParseDocumentsFileContents(byte[] PdfFileData, Guid fileId, string fileName)
    {
        if (PdfFileData == null || PdfFileData.Count() <= 0)
        {
            return null;
        }

        iTextSharp.text.pdf.PdfReader.unethicalreading = true;

        //read the text from each page, and put it all together
        var PageContents = new System.Text.StringBuilder(PdfFileData.Count());
        using (var engine = new Tesseract.TesseractEngine(tessdata_datapath, "eng", Tesseract.EngineMode.Default))
        using (var reader = new iTextSharp.text.pdf.PdfReader(PdfFileData))

以下是PDF that resulted in this exception的示例。（我意识到PDF中没有文字，但我必须找到一个我可以分享的文字。）

以下是所有导致此异常的PDF的其他一些示例： Example #2，Example #3，Example #4

编辑2： 我升级到iTextSharp版本5.5.12并有相同的例外。但在这两个版本中，以下PDF（和其他）不会导致此例外：Does Not Exception Here #1和Does Not Exception Here #2

iTextSharp异常：重建失败：字典键R不是名称。在文件指针9195处

0 个答案: