Question

以下是我用来从pdf提取图像的方法。但是子类型始终为null。我正在使用新版本的iText7库。如果有任何机构与新图书馆合作，请提出建议。

    public static string ExtractImageFromPDF(string sourcePdf)
    {            
        PdfReader reader = new PdfReader(sourcePdf);
        try
        {
            PdfDocument document = new PdfDocument(reader);

            for (int pageNumber = 1; pageNumber <= document.GetNumberOfPages(); pageNumber++)
            {
                PdfDictionary obj = (PdfDictionary)document.GetPdfObject(pageNumber);

                if (obj != null && obj.IsStream())
                {
                    PdfDictionary pd = (PdfDictionary)obj;
                    if (pd.ContainsKey(PdfName.Subtype) && pd.Get(PdfName.Subtype).ToString() == "/Image")
                    {
                        string filter = pd.Get(PdfName.Filter).ToString();
                        string width = pd.Get(PdfName.Width).ToString();
                        string height = pd.Get(PdfName.Height).ToString();
                        string bpp = pd.Get(PdfName.BitsPerComponent).ToString();
                        string extent = ".";
                        byte[] img = null;
                        switch (filter)
                        {
                            case "/FlateDecode":
                                byte[] arr = FlateDecodeFilter.FlateDecode(null, true);
                                Bitmap bmp = new Bitmap(Int32.Parse(width), Int32.Parse(height), PixelFormat.Format24bppRgb);
                                BitmapData bmd = bmp.LockBits(new Rectangle(0, 0, Int32.Parse(width), Int32.Parse(height)), ImageLockMode.WriteOnly,
                                    PixelFormat.Format24bppRgb);
                                Marshal.Copy(arr, 0, bmd.Scan0, arr.Length);
                                bmp.UnlockBits(bmd);
                                bmp.Save("d:\\pdf\\bmp1.png", ImageFormat.Png);
                                break;
                            case "/CCITTFaxDecode":
                                break;
                            default:
                                break;
                        }
                    }
                }
            }
        }
        catch
        {
            throw;
        }
        return "";
    }

Answer 1

在pd值上使用Quickwatch时，您看到的是什么？ iText 7状态的文档是一本字典，因此也许您可以检查可用的类型并找到要查找的适当字段。

PdfDictionary pd = (PdfDictionary)obj;

文档可在此处找到： https://api.itextpdf.com/iText7/dotnet/7.1.8/classi_text_1_1_kernel_1_1_pdf_1_1_pdf_dictionary.html

Answer 2

您的方法的想法是检查其中的每个间接对象是否为图像XObject，如果存在，则提取其中包含的图像数据。

实际上，实际上，您只遍历值1 .. document.GetNumberOfPages()作为对象编号，即仅遍历文档的一部分间接对象！

实际上，PDF中的间接对象比页面多，通常很多。

因此，请迭代直到document.GetNumberOfPdfObjects()-1。

如何使用iText7 C＃从pdf提取图像

2 个答案: