Question

我在C＃应用程序中使用iTextSharp。 PDF阅读器出错：

该文档没有页面根（意思是：它是无效的PDF）。

然后我们再次尝试使用以下正则表达式获取count属性：

using (StreamReader sr = new StreamReader(File.OpenRead(@"C:\test.pdf")))
{
    Regex regex = new Regex(@"/Type\s*/Page[^s]");                
    MatchCollection matches = regex.Matches(sr.ReadToEnd());
    Console.WriteLine(matches.Count);
}

但是使用这个正则表达式，我们得到双重计数..（即如果实际文件有6个计数，则此regext将计数返回为12）

获取此类文件的页面的任何解决方案？

我无法共享文件，因为它是客户提供的文件在将来验证文件内容时，我发现以下匹配正则表达式：

/Type /XObject 
/Type /Page 
/Type /Catalog 
/Type /XObject 
/Type /Pages 
/Type /Page 
/Type /XObject 
/Type /Pages 
/Type /Page 
/Type /XObject 
/Type /Pages 
/Type /Page 
/Type /XObject 
/Type /Pages 
/Type /Page 
/Type /XObject 
/Type /Pages 
/Type /Page 
%PaperPortPDFversion1 0 obj<</Contents 10 0 R/CropBox[0 0 595 842]/MediaBox[0 0 595 842]/Parent 2 0 R/Resources 9 0 R/Rotate 180/Type/Page>>
12 0 obj<</Contents 20 0 R/CropBox[0 0 595 842]/MediaBox[0 0 595 842]/Parent 16 0 R/Resources 19 0 R/Rotate 180/Type/Page>>
21 0 obj<</Contents 29 0 R/CropBox[0 0 595 842]/MediaBox[0 0 595 842]/Parent 25 0 R/Resources 28 0 R/Rotate 180/Type/Page>>
30 0 obj<</Contents 38 0 R/CropBox[0 0 595 842]/MediaBox[0 0 595 842]/Parent 34 0 R/Resources 37 0 R/Rotate 180/Type/Page>>
39 0 obj<</Contents 47 0 R/CropBox[0 0 595 842]/MediaBox[0 0 595 842]/Parent 43 0 R/Resources 46 0 R/Rotate 180/Type/Page>>
48 0 obj<</Contents 56 0 R/CropBox[0 0 595 842]/MediaBox[0 0 595 842]/Parent 52 0 R/Resources 55 0 R/Rotate 180/Type/Page>>

由于上次匹配，它会产生双重计数，然后是实际值。（即12个计数而不是6个计数）

是否有解决此类文件的解决方案，因为这是文件特定问题，而不是所有文件。

由于

PDF文件中的页数错误

0 个答案: