使用PDF文本提取图像并使用iTextSharp编辑它

时间:2016-02-01 18:14:39

标签: vb.net itextsharp

我正在尝试在Windows窗体中执行以下操作 1)在Windows窗体中阅读PDF 2)获取带有文本的图像 3)颜色/填充图像 4)将所有内容保存到新文件

我试过Problem with PdfTextExtractor in itext! 但它没有帮助。

以下是我尝试过的代码:

 Public Shared Sub ExtractImagesFromPDF(sourcePdf As String, outputPath As String)
    'NOTE:  This will only get the first image it finds per page.'
    Dim pdf As New PdfReader(sourcePdf)
    Dim raf As RandomAccessFileOrArray = New iTextSharp.text.pdf.RandomAccessFileOrArray(sourcePdf)

    Try
        For pageNumber As Integer = 1 To pdf.NumberOfPages
            Dim pg As PdfDictionary = pdf.GetPageN(pageNumber)

            ' recursively search pages, forms and groups for images.'
            Dim obj As PdfObject = FindImageInPDFDictionary(pg)
            If obj IsNot Nothing Then

                Dim XrefIndex As Integer = Convert.ToInt32(DirectCast(obj, PRIndirectReference).Number.ToString(System.Globalization.CultureInfo.InvariantCulture))
                Dim pdfObj As PdfObject = pdf.GetPdfObject(XrefIndex)
                Dim pdfStrem As PdfStream = DirectCast(pdfObj, PdfStream)
                Dim bytes As Byte() = PdfReader.GetStreamBytesRaw(DirectCast(pdfStrem, PRStream))
                If (bytes IsNot Nothing) Then
                    Using memStream As New System.IO.MemoryStream(bytes)
                        memStream.Position = 0
                        Dim img As System.Drawing.Image = System.Drawing.Image.FromStream(memStream)
                        ' must save the file while stream is open.'
                        If Not Directory.Exists(outputPath) Then
                            Directory.CreateDirectory(outputPath)
                        End If

                        Dim path__1 As String = Path.Combine(outputPath, [String].Format("{0}.jpg", pageNumber))
                        Dim parms As New System.Drawing.Imaging.EncoderParameters(1)
                        parms.Param(0) = New System.Drawing.Imaging.EncoderParameter(System.Drawing.Imaging.Encoder.Compression, 0)
                        'Dim jpegEncoder As System.Drawing.Imaging.ImageCodecInfo = iTextSharp.text.Utilities.GetImageEncoder("JPEG")'

                        img.Save(path__1) 'jpegEncoder, parms'
                    End Using
                End If
            End If
        Next
    Catch
        Throw
    Finally
        pdf.Close()
        raf.Close()
    End Try


End Sub

现在,这个的实际目的是得到这样的东西 enter image description here

如果这是实际的PDF,我将不得不检查该垃圾箱中是否有任何物品(通过该框中的文字)

如果有物品,那么我必须像下面那样给它上色 enter image description here

有人可以帮助我吗

可以检索PDF here

0 个答案:

没有答案