我正在开发一个显示PDF的Web应用程序,并允许用户订购文档的副本。我们希望在显示PDF时动态添加文本,例如“未付”或“样本”。我用itextsharp完成了这个。但是,页面图像很容易与水印文本分离,并使用各种免费软件程序提取。
如何将水印添加到PDF中的页面,但将页面图像和水印拼合在一起,以便水印成为pdf页面图像的一部分,从而防止水印被删除(除非此人想要使用的Photoshop)?
答案 0 :(得分:2)
如果我是你,我会走另一条路。使用iTextSharp(或其他库)将给定文档的每个页面提取到一个文件夹。然后使用一些程序(Ghostscript,Photoshop,也许是GIMP),您可以批量处理并将每个页面转换为图像。然后将叠加文本写入图像。最后使用iTextSharp将每个文件夹中的所有图像组合回PDF。
我知道这听起来像是一种痛苦,但你应该只为我假设的文件做一次。
如果你不想沿着这条路走下去,那么让我来看看你需要做什么来提取图像。以下大部分代码来自this post。在代码的最后我将图像保存到桌面。由于你有原始字节,所以你也可以轻松地将它们输入System.Drawing.Image
对象并将它们写回到一个新的PdfWriter
对象中,这听起来像你熟悉的那样。下面是一个完整的WinForms应用程序,目标是iTextSharp 5.1.1.0
Option Explicit On
Option Strict On
Imports iTextSharp.text
Imports iTextSharp.text.pdf
Imports System.IO
Imports System.Runtime.InteropServices
Public Class Form1
Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
''//File to process
Dim InputFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "SampleImage.pdf")
''//Bind a reader to our PDF
Dim R As New PdfReader(InputFile)
''//Setup some variable to use below
Dim bytes() As Byte
Dim obj As PdfObject
Dim pd As PdfDictionary
Dim filter, width, height, bpp As String
Dim pixelFormat As System.Drawing.Imaging.PixelFormat
Dim bmp As System.Drawing.Bitmap
Dim bmd As System.Drawing.Imaging.BitmapData
''//Loop through all of the references in the file
Dim xo = R.XrefSize
For I = 0 To xo - 1
''//Get the object
obj = R.GetPdfObject(I)
''//Make sure we have something and that it is a stream
If (obj IsNot Nothing) AndAlso obj.IsStream() Then
''//Case it to a dictionary object
pd = DirectCast(obj, PdfDictionary)
''//See if it has a subtype property that is set to /IMAGE
If pd.Contains(PdfName.SUBTYPE) AndAlso pd.Get(PdfName.SUBTYPE).ToString() = PdfName.IMAGE.ToString() Then
''//Grab various properties of the image
filter = pd.Get(PdfName.FILTER).ToString()
width = pd.Get(PdfName.WIDTH).ToString()
height = pd.Get(PdfName.HEIGHT).ToString()
bpp = pd.Get(PdfName.BITSPERCOMPONENT).ToString()
''//Grab the raw bytes of the image
bytes = PdfReader.GetStreamBytesRaw(DirectCast(obj, PRStream))
''//Images can be encoded in various ways. /DCTDECODE is the simplest because its essentially JPEG and can be treated as such.
''//If your PDFs contain the other types you will need to figure out how to handle those on your own
Select Case filter
Case PdfName.ASCII85DECODE.ToString()
Throw New NotImplementedException("Decoding this filter has not been implemented")
Case PdfName.ASCIIHEXDECODE.ToString()
Throw New NotImplementedException("Decoding this filter has not been implemented")
Case PdfName.FLATEDECODE.ToString()
''//This code from https://stackoverflow.com/questions/802269/itextsharp-extract-images/1220959#1220959
bytes = pdf.PdfReader.FlateDecode(bytes, True)
Select Case Integer.Parse(bpp)
Case 1
pixelFormat = Drawing.Imaging.PixelFormat.Format1bppIndexed
Case 24
pixelFormat = Drawing.Imaging.PixelFormat.Format24bppRgb
Case Else
Throw New Exception("Unknown pixel format " + bpp)
End Select
bmp = New System.Drawing.Bitmap(Int32.Parse(width), Int32.Parse(height), pixelFormat)
bmd = bmp.LockBits(New System.Drawing.Rectangle(0, 0, Int32.Parse(width), Int32.Parse(height)), System.Drawing.Imaging.ImageLockMode.WriteOnly, pixelFormat)
Marshal.Copy(bytes, 0, bmd.Scan0, bytes.Length)
bmp.UnlockBits(bmd)
Using ms As New MemoryStream
bmp.Save(ms, System.Drawing.Imaging.ImageFormat.Jpeg)
bytes = ms.GetBuffer()
End Using
Case PdfName.LZWDECODE.ToString()
Throw New NotImplementedException("Decoding this filter has not been implemented")
Case PdfName.RUNLENGTHDECODE.ToString()
Throw New NotImplementedException("Decoding this filter has not been implemented")
Case PdfName.DCTDECODE.ToString()
''//Bytes should be raw JPEG so they should not need to be decoded, hopefully
Case PdfName.CCITTFAXDECODE.ToString()
Throw New NotImplementedException("Decoding this filter has not been implemented")
Case PdfName.JBIG2DECODE.ToString()
Throw New NotImplementedException("Decoding this filter has not been implemented")
Case PdfName.JPXDECODE.ToString()
Throw New NotImplementedException("Decoding this filter has not been implemented")
Case Else
Throw New ApplicationException("Unknown filter found : " & filter)
End Select
''//At this points the byte array should contain a valid JPEG byte data, write to disk
My.Computer.FileSystem.WriteAllBytes(Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), I & ".jpg"), bytes, False)
End If
End If
Next
Me.Close()
End Sub
End Class
答案 1 :(得分:1)
整个页面必须呈现为图像。否则你会得到“文本对象”(文本的单个单词/字母)和水印对象(叠加图像),它们将始终是页面的不同/独立部分。