PDF添加文本和展平

时间:2011-09-22 18:40:00

标签: asp.net pdf itextsharp

我正在开发一个显示PDF的Web应用程序,并允许用户订购文档的副本。我们希望在显示PDF时动态添加文本,例如“未付”或“样本”。我用itextsharp完成了这个。但是,页面图像很容易与水印文本分离,并使用各种免费软件程序提取。

如何将水印添加到PDF中的页面,但将页面图像和水印拼合在一起,以便水印成为pdf页面图像的一部分,从而防止水印被删除(除非此人想要使用的Photoshop)?

2 个答案:

答案 0 :(得分:2)

如果我是你,我会走另一条路。使用iTextSharp(或其他库)将给定文档的每个页面提取到一个文件夹。然后使用一些程序(Ghostscript,Photoshop,也许是GIMP),您可以批量处理并将每个页面转换为图像。然后将叠加文本写入图像。最后使用iTextSharp将每个文件夹中的所有图像组合回PDF。

我知道这听起来像是一种痛苦,但你应该只为我假设的文件做一次。

如果你不想沿着这条路走下去,那么让我来看看你需要做什么来提取图像。以下大部分代码来自this post。在代码的最后我将图像保存到桌面。由于你有原始字节,所以你也可以轻松地将它们输入System.Drawing.Image对象并将它们写回到一个新的PdfWriter对象中,这听起来像你熟悉的那样。下面是一个完整的WinForms应用程序,目标是iTextSharp 5.1.1.0

Option Explicit On
Option Strict On

Imports iTextSharp.text
Imports iTextSharp.text.pdf
Imports System.IO
Imports System.Runtime.InteropServices

Public Class Form1

    Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
        ''//File to process
        Dim InputFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "SampleImage.pdf")

        ''//Bind a reader to our PDF
        Dim R As New PdfReader(InputFile)

        ''//Setup some variable to use below
        Dim bytes() As Byte
        Dim obj As PdfObject
        Dim pd As PdfDictionary
        Dim filter, width, height, bpp As String
        Dim pixelFormat As System.Drawing.Imaging.PixelFormat
        Dim bmp As System.Drawing.Bitmap
        Dim bmd As System.Drawing.Imaging.BitmapData

        ''//Loop through all of the references in the file
        Dim xo = R.XrefSize
        For I = 0 To xo - 1
            ''//Get the object
            obj = R.GetPdfObject(I)
            ''//Make sure we have something and that it is a stream
            If (obj IsNot Nothing) AndAlso obj.IsStream() Then
                ''//Case it to a dictionary object
                pd = DirectCast(obj, PdfDictionary)
                ''//See if it has a subtype property that is set to /IMAGE
                If pd.Contains(PdfName.SUBTYPE) AndAlso pd.Get(PdfName.SUBTYPE).ToString() = PdfName.IMAGE.ToString() Then
                    ''//Grab various properties of the image
                    filter = pd.Get(PdfName.FILTER).ToString()
                    width = pd.Get(PdfName.WIDTH).ToString()
                    height = pd.Get(PdfName.HEIGHT).ToString()
                    bpp = pd.Get(PdfName.BITSPERCOMPONENT).ToString()

                    ''//Grab the raw bytes of the image
                    bytes = PdfReader.GetStreamBytesRaw(DirectCast(obj, PRStream))

                    ''//Images can be encoded in various ways. /DCTDECODE is the simplest because its essentially JPEG and can be treated as such.
                    ''//If your PDFs contain the other types you will need to figure out how to handle those on your own
                    Select Case filter
                        Case PdfName.ASCII85DECODE.ToString()
                            Throw New NotImplementedException("Decoding this filter has not been implemented")
                        Case PdfName.ASCIIHEXDECODE.ToString()
                            Throw New NotImplementedException("Decoding this filter has not been implemented")
                        Case PdfName.FLATEDECODE.ToString()
                            ''//This code from https://stackoverflow.com/questions/802269/itextsharp-extract-images/1220959#1220959
                            bytes = pdf.PdfReader.FlateDecode(bytes, True)
                            Select Case Integer.Parse(bpp)
                                Case 1
                                    pixelFormat = Drawing.Imaging.PixelFormat.Format1bppIndexed
                                Case 24
                                    pixelFormat = Drawing.Imaging.PixelFormat.Format24bppRgb
                                Case Else
                                    Throw New Exception("Unknown pixel format " + bpp)
                            End Select
                            bmp = New System.Drawing.Bitmap(Int32.Parse(width), Int32.Parse(height), pixelFormat)
                            bmd = bmp.LockBits(New System.Drawing.Rectangle(0, 0, Int32.Parse(width), Int32.Parse(height)), System.Drawing.Imaging.ImageLockMode.WriteOnly, pixelFormat)
                            Marshal.Copy(bytes, 0, bmd.Scan0, bytes.Length)
                            bmp.UnlockBits(bmd)
                            Using ms As New MemoryStream
                                bmp.Save(ms, System.Drawing.Imaging.ImageFormat.Jpeg)
                                bytes = ms.GetBuffer()
                            End Using
                        Case PdfName.LZWDECODE.ToString()
                            Throw New NotImplementedException("Decoding this filter has not been implemented")
                        Case PdfName.RUNLENGTHDECODE.ToString()
                            Throw New NotImplementedException("Decoding this filter has not been implemented")
                        Case PdfName.DCTDECODE.ToString()
                            ''//Bytes should be raw JPEG so they should not need to be decoded, hopefully
                        Case PdfName.CCITTFAXDECODE.ToString()
                            Throw New NotImplementedException("Decoding this filter has not been implemented")
                        Case PdfName.JBIG2DECODE.ToString()
                            Throw New NotImplementedException("Decoding this filter has not been implemented")
                        Case PdfName.JPXDECODE.ToString()
                            Throw New NotImplementedException("Decoding this filter has not been implemented")
                        Case Else
                            Throw New ApplicationException("Unknown filter found : " & filter)
                    End Select

                    ''//At this points the byte array should contain a valid JPEG byte data, write to disk
                    My.Computer.FileSystem.WriteAllBytes(Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), I & ".jpg"), bytes, False)
                End If
            End If

        Next

        Me.Close()
    End Sub
End Class

答案 1 :(得分:1)

整个页面必须呈现为图像。否则你会得到“文本对象”(文本的单个单词/字母)和水印对象(叠加图像),它们将始终是页面的不同/独立部分。