使用iTextSharp从PDF中提取非数字“签名”图像

时间:2018-02-27 11:26:28

标签: asp.net .net pdf itext

我的任务是一个需要使用填充的表单字段读取PDF文档内容并将内容保存到数据库的项目。提取数据后,我应该能够使用模板的主副本和重新填充的表单字段数据重新创建文档。

我们的用户群将在移动设备上完成表单/模板(特别是适用于我们环境的Android设备)。他们还将使用Adobe Acrobat Reader移动应用程序来完成文档。完成后,处理文档的每位工程师将使用移动应用程序中的签名功能签署文档(多个签名可能出现在多个页面上)并提交表单(目前通过电子邮件将完成的PDF副本通过电子邮件发送到特定的电子邮件地址) )。

获得完成的PDF文档的副本后,我可以使用iTextSharp库从.NET应用程序中读取AcroForm字段,并执行我需要做的任何事情(存储字段“name”和“value”在数据库中)

Dim reader As PdfReader = New PdfReader(pdfBytes)
Dim pdfFormFields As AcroFields = reader.AcroFields
For Each formField In reader.AcroFields.Fields.Keys
    Dim ff As New FormField
    ff.Name = formField
    ff.Value = pdfFormFields.GetField(formField)
    Do_stuff_with(ff)
Next

然后,我可以在以后使用表单数据重新填充空白pdf模板,但是我正在努力阅读文件中嵌入的“签名”方面。

当通过Acrobat Reader Android应用程序完成时,我认为“签名”在技术上不是正确“数字签名”意义上的签名,可以使用iTextSharp AcroFields.GetSignatureNames()AcroFields.GetSignatureDictionary来读取方法。相反,我相信签名存储为文档Stream内的Annotation对象,但我目前无法读取并将其转换为Byte Array以存储在数据库中。<登记/> 我知道我还需要获取每个签名的页面/位置,以便我可以在以后重新填充。

我尝试了几种方法,包括从文档中提取所有图像,但这只提取模板中存在的嵌入图像,而不是注释签名。使用仅包含提交按钮和签名的空白虚拟文档,在获取字典对象PdfName.RESOURCES后,如果没有任何内容,则会出现错误。

Dim pdf as New PdfReader(bytes)
For i As Int16 = 1 To pdf.NumberOfPages
    Dim pg As PdfDictionary = pdf.GetPageN(i)
    Dim res As PdfDictionary = CType(PdfReader.GetPdfObject(pg.Get(PdfName.RESOURCES)), PdfDictionary)
    ' ### 
    ' ### 
    ' ### Errors on next line with 
    ' ###     "Object reference not set to an instance of an object"
    ' ### 
    ' ### 
    Dim xobj As PdfDictionary = CType(PdfReader.GetPdfObject(pg.Get(PdfName.XOBJECT)), PdfDictionary)
    If xobj IsNot Nothing Then
        For Each name As PdfName In xobj.Keys
            Dim obj As PdfObject = xobj.Get(name)
            If obj.IsIndirect Then
                Dim tg As PdfDictionary = CType(PdfReader.GetPdfObject(obj), PdfDictionary)
                Dim type As PdfName = CType(PdfReader.GetPdfObject(tg.Get(PdfName.SUBTYPE)), PdfName)
                If PdfName.IMAGE.Equals(type) Then
                    Dim xrefIdx As Integer = CType(obj, PRIndirectReference).Number
                    Dim pdfObj As PdfObject = pdf.GetPdfObject(xrefIdx)
                    Dim str As PdfStream = CType(pdfObj, PdfStream)
                    Dim bytes As Byte() = PdfReader.GetStreamBytesRaw(CType(str, PRStream))
                    Dim img As iTextSharp.text.Image = iTextSharp.text.Image.GetInstance(CType(obj, PRIndirectReference))
                    Dim filter As String = tg.Get(PdfName.FILTER).ToString
                    If filter = "/DCTDecode" Then
                        Dim img2 As System.Drawing.Image = System.Drawing.Image.FromStream(New MemoryStream(bytes))
                        Dim stream As MemoryStream = New MemoryStream
                        img2.Save(stream, System.Drawing.Imaging.ImageFormat.Jpeg)
                        stream.Position = 0
                        PdfReader.KillIndirect(obj)
                        img = iTextSharp.text.Image.GetInstance(stream)
                        writer.AddDirectImageSimple(img, CType(obj, PRIndirectReference))
                    End If
                End If
            End If
        Next
    End If
Next

如果我使用iText RUPS检查文档,我可以看到两个Stream个对象(Inspect1.png),我认为它是我的测试文档TestDoc_Complete.pdf中的两个签名但我无法将它们提取到Byte ArrayMemory Stream中,以便我可以操作和保存。

任何帮助(VB / C#)都可以帮助我解决这个问题。

由于

--EDIT -

我现在可以使用XrefSize遍历PdfObjects并识别哪些对象是Streams。我可以读取Stream`字节和原始字节并将它们输出到文件中,但任何图像查看器都无法读取这些字节。

Dim pdf = New PdfReader(bytes)
Dim obj As PdfObject
For i As Integer = 1 To pdf.XrefSize
    Try
        obj = pdf.GetPdfObject(i)
        If obj IsNot Nothing And obj.IsStream Then
            Dim stream As PRStream = CType(obj, PRStream)
            Dim type As PdfName = Nothing
            Try
                type = CType(pdfreader.GetPdfObject(stream.Get(PdfName.FILTER)), PdfName)
            Catch ex As Exception
            End Try

            If type IsNot Nothing And PdfName.FLATEDECODE.Equals(type) Then
                Dim b1 As Byte() = pdfreader.GetStreamBytes(stream)
                Dim b2 As Byte() = pdfreader.GetStreamBytesRaw(stream)
                csLog.AddLog("Stream Length: " & stream.Length, csLogging.DebugLevel.Debug)
                csLog.AddLog("bytes1 Length: " & b1.Length, csLogging.DebugLevel.Debug)
                csLog.AddLog("bytes2 Length: " & b2.Length, csLogging.DebugLevel.Debug)

                Dim fos As FileStream
                ' ### Write Bytes to file for testing
                fos = New FileStream(Server.MapPath(".") & "\bytes1" & i, FileMode.Create)
                fos.Write(b1, 0, b1.Length)
                fos.Flush()
                fos.Close()

                ' ### Write RawBytes to file for testing
                fos = New FileStream(Server.MapPath(".") & "\bytes2" & i, FileMode.Create)
                fos.Write(b2, 0, b2.Length)
                fos.Flush()
                fos.Close()


                ' ### CONVERSION ATTEMPTS
                ConvertAttempt1(b2, i)     ' ### Using Raw Bytes
                ConvertAttempt2(stream, i)

            End If
        End If

    Catch ex As Exception
    End Try
Next

使用b1的第一个文件似乎是PRStream的文字表示 q .160714 0 0 .160714 0 0 cm 0.00000 0.00000 0.00000 RG 0.00000 0.00000 0.00000 rg 1 J 1 j 26.48880 w 488.00000 115.43372 m 488.00000 115.43372 l S 20.37600 w 184.00000 155.43372 m 184.00000 155.43372 184.00000 155.43372 182.44000 156.45367 c S 20.37600 w ..... ..... ..... 我假设是矢量图形曲线/笔画。 b2(原始字节)的输出具有与我期望的签名(3305和4834)相同的内容长度,如iText RUPS中的两个标记所示。

我试图将字节转换为图像(JPG)但是会出错

' ### Conversion attempt 1
Sub ConvertAttempt1(ByVal rawBytes As Byte(), ByVal xRef As Int16)
    Try
        Using memStream As MemoryStream = New MemoryStream(rawBytes)
            memStream.Position = 0
            ' ###
            ' ###
            ' ### Falls over on next line - "Parameter is not valid"
            ' ###
            ' ###
            Dim img As System.Drawing.Image = System.Drawing.Image.FromStream(memStream)
            Dim path As String = System.IO.Path.Combine(Server.MapPath("."), String.Format("convert1_{0}.jpg", xRef))
            Dim parms As System.Drawing.Imaging.EncoderParameters = New System.Drawing.Imaging.EncoderParameters(1)
            parms.Param(0) = New System.Drawing.Imaging.EncoderParameter(System.Drawing.Imaging.Encoder.Compression, 0)
            Dim jpegEncoder As System.Drawing.Imaging.ImageCodecInfo = GetImageEncoder("JPEG")
            img.Save(path, jpegEncoder, parms)
        End Using
    Catch ex As Exception
        'csLog.AddLog(ex.Message, csLogging.DebugLevel.Errors)
        'csLog.AddLog(ex.StackTrace, csLogging.DebugLevel.Errors)
    End Try
End Sub

Sub ConvertAttempt2(ByVal stream As PRStream, ByVal xRef As Int16)
    ' ### Conversion attempt 2
    Try
        ' ###
        ' ###
        ' ### Falls over on next line - "Object reference not set to an instance of an object."
        ' ###
        ' ###
        Dim pdfImage As PdfImageObject = New PdfImageObject(stream)
        Dim img As System.Drawing.Image = pdfImage.GetDrawingImage()
        Dim path As String = System.IO.Path.Combine(Server.MapPath("."), String.Format("convert2_{0}.jpg", xRef))
        Dim parms As System.Drawing.Imaging.EncoderParameters = New System.Drawing.Imaging.EncoderParameters(1)
        parms.Param(0) = New System.Drawing.Imaging.EncoderParameter(System.Drawing.Imaging.Encoder.Compression, 0)
        Dim jpegEncoder As System.Drawing.Imaging.ImageCodecInfo = GetImageEncoder("JPEG")
        img.Save(path, jpegEncoder, parms)
    Catch ex As Exception
        csLog.AddLog(ex.Message, csLogging.DebugLevel.Errors)
        csLog.AddLog(ex.StackTrace, csLogging.DebugLevel.Errors)
    End Try
End Sub

0 个答案:

没有答案