Question

我有以下代码

using (var reader = new PdfReader(pdfPath))
{
    for (int pageIndex = 1; pageIndex <= reader.NumberOfPages; pageIndex++)
    {
        var text = PdfTextExtractor.GetTextFromPage(reader, pageIndex);
        //my other logic goes here
    }
}

我在行

获取的值不能为空

using (var reader = new PdfReader(pdfPath))

我不确定为什么它会失败一些PDF。我能够阅读100个PDF但只有4个PDF，我收到了这个错误。

错误：

System.ArgumentNullException: Value cannot be null.
Parameter name: key
   at System.Collections.Generic.Dictionary`2.FindEntry(TKey key)
   at System.Collections.Generic.Dictionary`2.TryGetValue(TKey key, TValue& value)
   at System.util.collections.HashSet2`1.AddAndCheck(T item)
   at iTextSharp.text.pdf.PdfReader.PageRefs.IteratePages(PRIndirectReference rpage)
   at iTextSharp.text.pdf.PdfReader.PageRefs.IteratePages(PRIndirectReference rpage)
   at iTextSharp.text.pdf.PdfReader.PageRefs.IteratePages(PRIndirectReference rpage)
   at iTextSharp.text.pdf.PdfReader.PageRefs.IteratePages(PRIndirectReference rpage)
   at iTextSharp.text.pdf.PdfReader.PageRefs.IteratePages(PRIndirectReference rpage)
   at iTextSharp.text.pdf.PdfReader.PageRefs.ReadPages()
   at iTextSharp.text.pdf.PdfReader.PageRefs..ctor(PdfReader reader)
   at iTextSharp.text.pdf.PdfReader.ReadPages()
   at iTextSharp.text.pdf.PdfReader.ReadPdf()
   at iTextSharp.text.pdf.PdfReader..ctor(IRandomAccessSource byteSource, Boolean partialRead, Byte[] ownerPassword, X509Certificate certificate, ICipherParameters certificateKey, Boolean closeSourceOnConstructorError)
   at iTextSharp.text.pdf.PdfReader..ctor(String filename)

我的iTextSharp版本 5.5.7.0

Answer 1

最简单的原因是，在这4个PDF上，pdfPath是null而不是字符串。检查pdfPath中的空值。

Answer 2

4个PDF的路径可能无效，这意味着那里没有PDF文件。

Answer 3

为了结束这个话题，我请求PDF供应商为我重新生成有问题的文件。他们确实重新生成并发送给我，我可以在没有任何代码更改的情况下处理它们。看来PDF内容中存在错误，iTextSharp无法正确读取。我仍然想知道，因为他们的流程和流程没有变化。它可能是某处损坏的PDF格式。

iTextSharp PDF读取错误

3 个答案: