Question

我尝试使用iTextSharp压缩PDF。有很多页面将彩色图像存储为JPEG（DCTDECODE）......因此我将它们转换为黑白PNG并在文档中替换它们（PNG比JPG小得多，适用于黑白格式）

我有以下方法：

    private static bool TryCompressPdfImages(PdfReader reader)
    {
        try
        {
            int n = reader.XrefSize;
            for (int i = 0; i < n; i++)
            {
                PdfObject obj = reader.GetPdfObject(i);
                if (obj == null || !obj.IsStream())
                {
                    continue;
                }

                var dict = (PdfDictionary)PdfReader.GetPdfObject(obj);
                var subType = (PdfName)PdfReader.GetPdfObject(dict.Get(PdfName.SUBTYPE));
                if (!PdfName.IMAGE.Equals(subType))
                {
                    continue;
                }

                var stream = (PRStream)obj;
                try
                {
                    var image = new PdfImageObject(stream);

                    Image img = image.GetDrawingImage();
                    if (img == null) continue;

                    using (img)
                    {
                        int width = img.Width;
                        int height = img.Height;

                        using (var msImg = new MemoryStream())
                        using (var bw = img.ToBlackAndWhite())
                        {
                            bw.Save(msImg, ImageFormat.Png);
                            msImg.Position = 0;
                            stream.SetData(msImg.ToArray(), false, PdfStream.NO_COMPRESSION);
                            stream.Put(PdfName.TYPE, PdfName.XOBJECT);
                            stream.Put(PdfName.SUBTYPE, PdfName.IMAGE);
                            stream.Put(PdfName.FILTER, PdfName.FLATEDECODE);
                            stream.Put(PdfName.WIDTH, new PdfNumber(width));
                            stream.Put(PdfName.HEIGHT, new PdfNumber(height));
                            stream.Put(PdfName.BITSPERCOMPONENT, new PdfNumber(8));
                            stream.Put(PdfName.COLORSPACE, PdfName.DEVICERGB);
                            stream.Put(PdfName.LENGTH, new PdfNumber(msImg.Length));
                        }
                    }
                }
                catch (Exception ex)
                {
                    Trace.TraceError(ex.ToString());
                }
                finally
                {
                    // may or may not help      
                    reader.RemoveUnusedObjects();
                }
            }
            return true;
        }
        catch (Exception ex)
        {
            Trace.TraceError(ex.ToString());
            return false;
        }
    }

    public static Image ToBlackAndWhite(this Image image)
    {
        image = new Bitmap(image);
        using (Graphics gr = Graphics.FromImage(image))
        {
            var grayMatrix = new[]
            {
                new[] {0.299f, 0.299f, 0.299f, 0, 0},
                new[] {0.587f, 0.587f, 0.587f, 0, 0},
                new[] {0.114f, 0.114f, 0.114f, 0, 0},
                new [] {0f, 0, 0, 1, 0},
                new [] {0f, 0, 0, 0, 1}
            };

            var ia = new ImageAttributes();
            ia.SetColorMatrix(new ColorMatrix(grayMatrix));
            ia.SetThreshold((float)0.8); // Change this threshold as needed
            var rc = new Rectangle(0, 0, image.Width, image.Height);
            gr.DrawImage(image, rc, 0, 0, image.Width, image.Height, GraphicsUnit.Pixel, ia);
        }
        return image;
    }

我尝试了多种COLORSPACE和BITSPERCOMPONENT，但总是得到＆＃34;图像数据不足＆＃34;，＆＃34;内存不足＆＃34;或者＆＃34;存在错误在这个页面＆＃34;试图打开生成的PDF ...所以我一定做错了。我非常确定FLATEDECODE是正确的选择。

非常感谢任何帮助。

Answer 1

问题：

您有一张带有彩色JPG的PDF。例如：image.pdf

如果查看此PDF，您会看到图像流的过滤器为/DCTDecode，颜色空间为/DeviceRGB。

现在您要替换PDF中的图像，以便结果如下所示：image_replaced.pdf

在此PDF中，过滤器为/FlateDecode，颜色空间更改为/DeviceGray。

在转换过程中，您希望使用PNG格式。

示例：

我为您做了一个转换示例：ReplaceImage

我将逐步解释这个例子：

第1步：查找图片

在我的例子中，我知道只有一个图像，所以我正在以快速而肮脏的方式检索带有图像字典的PRStream和图像字节。

PdfReader reader = new PdfReader(src);
PdfDictionary page = reader.getPageN(1);
PdfDictionary resources = page.getAsDict(PdfName.RESOURCES);
PdfDictionary xobjects = resources.getAsDict(PdfName.XOBJECT);
PdfName imgRef = xobjects.getKeys().iterator().next();
PRStream stream = (PRStream) xobjects.getAsStream(imgRef);

我使用第1页的页面字典中列出的/XObject转到/Resources字典。我接受了我遇到的第一个XObject，假设它是一个imagem，我将该图像作为PRStream对象。

您的代码比我的更好，但这部分代码与您的问题无关，并且它在我的示例的上下文中起作用，所以让我们忽略这一事实，这对其他PDF不起作用。你真正关心的是第2步和第3步。

第2步：将彩色JPG转换为黑白PNG

让我们编写一个方法，该方法采用PdfImageObject并将其转换为Image对象，该对象将更改为灰色并存储为PNG：

public static Image makeBlackAndWhitePng(PdfImageObject image) throws IOException, DocumentException {
    BufferedImage bi = image.getBufferedImage();
    BufferedImage newBi = new BufferedImage(bi.getWidth(), bi.getHeight(), BufferedImage.TYPE_USHORT_GRAY);
    newBi.getGraphics().drawImage(bi, 0, 0, null);
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    ImageIO.write(newBi, "png", baos);
    return Image.getInstance(baos.toByteArray());
}

我们使用标准BufferedImage操作将原始图像转换为黑白图像：我们将原始图像bi绘制到newBi类型的新图像TYPE_USHORT_GRAY。

完成此操作后，您需要PNG格式的图像字节。这也是使用标准ImageIO功能完成的：我们只是将BufferedImage写入一个字节数组，告诉ImageIO我们想要"png"。

我们可以使用结果字节来创建Image对象。

Image img = makeBlackAndWhitePng(new PdfImageObject(stream));

现在我们有一个iText Image对象，但请注意，此Image对象中存储的图像字节不再是PNG格式。正如评论中已经提到的，PDF不支持PNG。 iText会将图像字节更改为PDF格式支持的格式（有关详细信息，请参阅The ABC of PDF的4.2.6.2节）。

步骤3：用新图像流替换原始图像流

我们现在有一个Image对象，但我们真正需要的是用新的替换原始图像流，我们还需要调整图像字典，/DCTDecode将变为{{ 1}}，/FlateDecode将更改为/DeviceRGB，/DeviceGray的值也会不同。

您正在手动创建图像流及其字典。那太勇敢了。我把这份工作留给了iText的/Length对象：

PdfImage

PdfImage image = new PdfImage(makeBlackAndWhitePng(new PdfImageObject(stream)), "", null);扩展PdfImage，我现在可以用这个新流替换原始流：

PdfStream

你在这里做事的顺序很重要。您不希望public static void replaceStream(PRStream orig, PdfStream stream) throws IOException { orig.clear(); ByteArrayOutputStream baos = new ByteArrayOutputStream(); stream.writeContent(baos); orig.setData(baos.toByteArray(), false); for (PdfName name : stream.getKeys()) { orig.put(name, stream.get(name)); } }方法篡改长度和过滤器。

第4步：更换流后保留文档

我想这个部分并不难：

setData()

<强>问题：

我不是C＃开发人员。我从内到外知道PDF，我知道Java。

如果您的问题是在步骤2中引起的，那么您将不得不发布另一个问题，询问如何将彩色JPEG图像转换为黑白PNG图像。
如果您的问题是在第3步中引起的（例如，因为您使用的是replaceStream(stream, image); PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest)); stamper.close(); reader.close();而不是/DeviceRGB），那么此答案将解决您的问题。

PDF转换为黑白PNG

1 个答案: