我正在尝试在Windows窗体中执行以下操作 1)在Windows窗体中阅读PDF 2)获取带有文本的图像 3)颜色/填充图像 4)将所有内容保存到新文件
我试过Problem with PdfTextExtractor in itext! 但它没有帮助。
以下是我尝试过的代码:
Public Shared Sub ExtractImagesFromPDF(sourcePdf As String, outputPath As String)
'NOTE: This will only get the first image it finds per page.'
Dim pdf As New PdfReader(sourcePdf)
Dim raf As RandomAccessFileOrArray = New iTextSharp.text.pdf.RandomAccessFileOrArray(sourcePdf)
Try
For pageNumber As Integer = 1 To pdf.NumberOfPages
Dim pg As PdfDictionary = pdf.GetPageN(pageNumber)
' recursively search pages, forms and groups for images.'
Dim obj As PdfObject = FindImageInPDFDictionary(pg)
If obj IsNot Nothing Then
Dim XrefIndex As Integer = Convert.ToInt32(DirectCast(obj, PRIndirectReference).Number.ToString(System.Globalization.CultureInfo.InvariantCulture))
Dim pdfObj As PdfObject = pdf.GetPdfObject(XrefIndex)
Dim pdfStrem As PdfStream = DirectCast(pdfObj, PdfStream)
Dim bytes As Byte() = PdfReader.GetStreamBytesRaw(DirectCast(pdfStrem, PRStream))
If (bytes IsNot Nothing) Then
Using memStream As New System.IO.MemoryStream(bytes)
memStream.Position = 0
Dim img As System.Drawing.Image = System.Drawing.Image.FromStream(memStream)
' must save the file while stream is open.'
If Not Directory.Exists(outputPath) Then
Directory.CreateDirectory(outputPath)
End If
Dim path__1 As String = Path.Combine(outputPath, [String].Format("{0}.jpg", pageNumber))
Dim parms As New System.Drawing.Imaging.EncoderParameters(1)
parms.Param(0) = New System.Drawing.Imaging.EncoderParameter(System.Drawing.Imaging.Encoder.Compression, 0)
'Dim jpegEncoder As System.Drawing.Imaging.ImageCodecInfo = iTextSharp.text.Utilities.GetImageEncoder("JPEG")'
img.Save(path__1) 'jpegEncoder, parms'
End Using
End If
End If
Next
Catch
Throw
Finally
pdf.Close()
raf.Close()
End Try
End Sub
现在,这个的实际目的是得到这样的东西
如果这是实际的PDF,我将不得不检查该垃圾箱中是否有任何物品(通过该框中的文字)
如果有物品,那么我必须像下面那样给它上色
有人可以帮助我吗
可以检索PDF here。