Question

我使用以下代码使用itext-sharp将PDF转换为图像。

private static System.Drawing.Image ExtractImages(String PDFSourcePath)
{
    iTextSharp.text.pdf.RandomAccessFileOrArray RAFObj = null;
    iTextSharp.text.pdf.PdfReader PDFReaderObj = null;
    iTextSharp.text.pdf.PdfObject PDFObj = null;
    iTextSharp.text.pdf.PdfStream PDFStremObj = null;

    try
    {
        RAFObj = new iTextSharp.text.pdf.RandomAccessFileOrArray(PDFSourcePath);
        PDFReaderObj = new iTextSharp.text.pdf.PdfReader(RAFObj, null);

        for (int i = 0; i <= PDFReaderObj.XrefSize - 1; i++)
        {
            PDFObj = PDFReaderObj.GetPdfObject(i);

            if ((PDFObj != null) && PDFObj.IsStream())
            {
                PDFStremObj = (iTextSharp.text.pdf.PdfStream)PDFObj;
                iTextSharp.text.pdf.PdfObject subtype = PDFStremObj.Get(iTextSharp.text.pdf.PdfName.SUBTYPE);

                if ((subtype != null) && subtype.ToString() == iTextSharp.text.pdf.PdfName.IMAGE.ToString())
                {
                    byte[] bytes = iTextSharp.text.pdf.PdfReader.GetStreamBytesRaw((iTextSharp.text.pdf.PRStream)PDFStremObj);

                    if ((bytes != null))
                    {
                        try
                        {
                            System.IO.MemoryStream MS = new System.IO.MemoryStream(bytes);
                            Bitmap ImgPDF = new Bitmap(MS);
                            return ImgPDF;
                        }
                        catch (Exception)
                        {

                        }

                    }
                }
            }
        }

        RAFObj.Close();
        PDFReaderObj.Close();
        return null;
    }
    catch (Exception ex)
    {
        throw new Exception(ex.Message);
    }

}

它适用于某些pdf文件，但对于某些文件，它会在

处引发异常

Bitmap ImgPDF = new Bitmap(MS);

参数无效

我真的很困惑。为什么会这样。是由于文件的安全性差异还是其他原因？帮我解决这个问题。

Answer 1

您需要检查流的/过滤器以查看给定图像使用的图像格式。它可能是标准的图像格式：

DCTDecode（jpeg）
JPXDecode（jpeg 2000）
JBIG2Decode（jbig是仅限B＆amp; W的格式）
CCITTFaxDecode（传真格式，PDF支持第3组和第4组）

除此之外，您需要获取原始字节（就像您一样），并使用图像流的宽度，高度，每个组件的位数，颜色组件的数量（可以是CMYK，索引，RGB）来构建图像在ISO PDF SPECIFICATION（免费提供）的第8.9节中定义的，或者其他一些东西。

因此，在某些情况下，您的代码会起作用，但在其他情况下，它会因您提到的异常而失败。 Source

Answer 2

我想我遇到了同样的问题。在我的情况下，当图像是jbig2格式时抛出异常。在我的情况下，图像流的宽度和高度设置为0，流有一些字节。不幸的是，我没有解决这个问题。

使用C＃中的itextsharp从PDF中提取位图图像

2 个答案: