PDF文件损坏,将内存流移动到文件流时无法修复

时间:2012-06-05 20:50:10

标签: c# .net vb.net itext

我正在使用iTextSharp和VB.Net将图像标记到PDF文档中。 (因为这不是语言特定的,我也标记为C#。)我有两个使用该过程的应用程序。

  • 第一个使用来自内存流的字节来显示PDF 在线文件。这件作品正在发挥作用

  • 第二个使用相同的功能,而是将PDF保存到 文件。此部分生成无效的PDF。

我见过一些类似的问题,但他们最初都在创建一个文档,并在代码中有一个文档对象。他们的记忆流从一开始就是腐败的。我的代码没有文档对象,我的原始内存流打开正常。

以下是我收到错误的地方:(我必须将m中的缓冲区放入新的内存流中,因为fillPDF函数中的压模默认为关闭流,除非另有说明。)

Dim m As MemoryStream = PDFHelper.fillPDF(filename, Nothing, markers, "")
Dim m2 As New MemoryStream(m.GetBuffer, 0, m.GetBuffer.Length)
Dim f As FileStream = New FileStream("C:\temp.pdf", FileMode.Create)
m2.CopyTo(f, m.GetBuffer.Length)
m2.Close()
f.Close()

以下是我在网站上成功使用它的方法之一。这个不使用图像,虽然其他一些类似的成功地方确实在多个文档上使用图像然后合并在一起。

Dim m As System.IO.MemoryStream = PDFHelper.fillPDF(filename, New Dictionary(Of String, String), New List(Of PDFHelper.PDfImage), "SAMPLE")
Dim data As Byte() = m.GetBuffer
Response.Clear()

//Send the file to the output stream
Response.Buffer = True

//Try and ensure the browser always opens the file and doesn’t just prompt to “open/save”.
Response.AddHeader("Content-Length", data.Length.ToString())
Response.AddHeader("Content-Disposition", "inline; filename=" + "Sample")
Response.AddHeader("Expires", "0")
Response.AddHeader("Pragma", "cache")
Response.AddHeader("Cache-Control", "private")

//Set the output stream to the correct content type (PDF).
Response.ContentType = "application/pdf"
Response.AddHeader("Accept-Ranges", "bytes")

//Output the file
Response.BinaryWrite(data)

//Flushing the Response to display the serialized data to the client browser.
Response.Flush()

Try
    Response.End()
Catch ex As Exception
    Throw ex
End Try

这是我的实用程序类中的函数(PDFHelper.fillPDF)

  Public Shared Function fillPDF(fileToFill As String, Optional fieldValues As Dictionary(Of String, String) = Nothing, Optional images As List(Of PDfImage) = Nothing, Optional watermarkText As String = "") As MemoryStream

        Dim m As MemoryStream = New MemoryStream() // for storing the pdf
        Dim reader As PdfReader = New PdfReader(fileToFill) // for reading the document
        Dim outStamper As PdfStamper = New PdfStamper(reader, m) //for filling the document

        If fieldValues IsNot Nothing Then
            For Each kvp As KeyValuePair(Of String, String) In fieldValues
                outStamper.AcroFields.SetField(kvp.Key, kvp.Value)
            Next
        End If


        If images IsNot Nothing AndAlso images.Count > 0 Then //add all the images

            For Each PDfImage In images
                Dim img As iTextSharp.text.Image = Nothing //image to stamp

                //set up the image (different for different cases
                Select Case PDfImage.ImageType
                    //removed for brevity
                End Select

                Dim overContent As PdfContentByte = outStamper.GetOverContent(PDfImage.PageNumber) // specify page number for stamping
                overContent.AddImage(img)

            Next

        End If

        //add the water mark
        If watermarkText <> "" Then
            Dim underContent As iTextSharp.text.pdf.PdfContentByte = Nothing
            Dim watermarkRect As iTextSharp.text.Rectangle = reader.GetPageSizeWithRotation(1)

          //removed for brevity
        End If

        //flatten and close out
        outStamper.FormFlattening = True
        outStamper.SetFullCompression()
        outStamper.Close()
        reader.Close()
        Return m

2 个答案:

答案 0 :(得分:3)

由于您的代码正在努力流式传输PDF,解决问题的一种简单方法是对fillPDF方法进行一些小改动 - 让它返回一个字节数组:

// other parameters left out for simplicity sake  
public static byte[] fillPDF(string resource) {
  PdfReader reader = new PdfReader(resource);
  using (var ms = new MemoryStream()) {
    using (PdfStamper stamper = new PdfStamper(reader, ms)) {
      // do whatever you need to do
    }
    return ms.ToArray();
  }      
}

然后,您可以将字节数组流式传输到ASP.NET 中的客户端,并将其保存到文件系统:

// get the manipulated PDF    
byte[] myPdf = fillPDF(inputFile);
// stream via ASP.NET
Response.BinaryWrite(myPdf);
// save to file system
File.WriteAllBytes(outputFile, myPdf);

如果您是从标准ASP.NET Web表单生成PDF,请不要忘记在编写PDF后调用Response.End(),否则字节数组将在末尾附加HTML标记垃圾。

答案 1 :(得分:0)

这会将现有PDF复制到MemoryStream中,然后将其保存到磁盘。也许你可以调整它来解决你的问题?

  Private Sub Button1_Click(sender As System.Object, e As System.EventArgs) Handles Button1.Click
    Dim strInputFilename As String = "C:\Junk\Junk.pdf"
    Dim strOutputFilename As String = "C:\Junk\Junk2.pdf"
    Dim byt() As Byte
    Using ms As New MemoryStream
      '1. Load PDF into memory stream'
      Using bw As New BinaryWriter(ms)
        Using fsi As New FileStream(strInputFilename, FileMode.Open)
          Using br As New BinaryReader(fsi)
            Try
              Do
                bw.Write(br.ReadByte())
              Loop
            Catch ex As EndOfStreamException
            End Try
          End Using
        End Using
      End Using
      byt = ms.ToArray()
    End Using
    '2. Write memory copy of PDF back to disk'
    My.Computer.FileSystem.WriteAllBytes(strOutputFilename, byt, False)
    Process.Start(strOutputFilename)
  End Sub