iTextSharp System.OutOfMemoryException

时间:2016-05-05 17:08:38

标签: c# itextsharp

我在尝试创建大型PDF文件时遇到问题。基本上我有一个字节数组列表,每个字节数组包含一个字节数组形式的PDF。我想将字节数组合并为一个PDF。这适用于较小的文件(2000页以下),但是当我尝试创建一个12,00页文件时它就被轰炸了)。最初我使用的是MemoryStream,但经过一些研究,一个常见的解决方案是使用FileStream。所以我尝试了一种文件流方法,但得到了类似的结果。该列表包含3,800条记录,每条记录包含4页。 MemoryStream在570左右后发生炸弹.FileStream大约有680条记录。代码崩溃后的当前文件大小为60MB。我究竟做错了什么?这是我的代码,代码崩溃" copy.AddPage(curPg);"指令,在" for(" loop。

    private byte[] MergePDFs(List<byte[]> PDFs)
    {
        iTextSharp.text.Document doc = new iTextSharp.text.Document();
        byte[] completePDF;
        Guid uniqueId = Guid.NewGuid();
        string tempFileName = Server.MapPath("~/" + uniqueId.ToString() + ".pdf");

        //using (MemoryStream ms = new MemoryStream())
        using(FileStream ms = new FileStream(tempFileName, FileMode.Create, FileAccess.Write, FileShare.Read))
        {
            iTextSharp.text.pdf.PdfCopy copy = new iTextSharp.text.pdf.PdfCopy(doc, ms);
            doc.Open();

            int i = 0;
            foreach (byte[] PDF in PDFs)
            {
                i++;
                // Create a reader
                iTextSharp.text.pdf.PdfReader reader = new iTextSharp.text.pdf.PdfReader(PDF);

                // Cycle through all the pages
                for (int currentPageNumber = 1; currentPageNumber <= reader.NumberOfPages; ++currentPageNumber)
                {
                    // Read a page
                    iTextSharp.text.pdf.PdfImportedPage curPg = copy.GetImportedPage(reader, currentPageNumber);

                    // Add the page over to the rest of them
                    copy.AddPage(curPg);
                }

                // Close the reader
                reader.Close();
            }

            // Close the document
            doc.Close();

            // Close the copier
            copy.Close();

            // Convert the memorystream to a byte array
            //completePDF = ms.ToArray();
        }

        //return completePDF;
        return GetPDFsByteArray(tempFileName);
    }

2 个答案:

答案 0 :(得分:3)

几点说明:

  1. PdfCopy实施iDisposable,因此您应该尝试查看using是否有帮助。
  2. PdfCopy.FreeReader()会有所帮助。
  3. 无论如何,不​​确定您是否正在使用MVC或WebForms,但这是一个简单的工作HTTP handler,使用 15页125KB 测试文件进行测试工作站:

    <%@ WebHandler Language="C#" Class="MergeFiles" %>
    using System;
    using System.Collections.Generic;
    using System.Web;
    using System.IO; 
    using iTextSharp.text; 
    using iTextSharp.text.pdf; 
    
    public class MergeFiles : IHttpHandler
    {
        public void ProcessRequest(HttpContext context)
        {
            List<byte[]> pdfs = new List<byte[]>();
            var pdf = File.ReadAllBytes(context.Server.MapPath("~/app_data/test.pdf"));
            for (int i = 0; i < 4000; ++i) pdfs.Add(pdf);
    
            var Response = context.Response;
            Response.ContentType = "application/pdf";
            Response.AddHeader(
                "content-disposition",
                "attachment; filename=MergeLotsOfPdfs.pdf"
            );
            Response.BinaryWrite(MergeLotsOfPdfs(pdfs));
        }
    
        byte[] MergeLotsOfPdfs(List<byte[]> pdfs)
        {
            using (var ms = new MemoryStream())
            {
                using (Document document = new Document())
                {
                    using (PdfCopy copy = new PdfCopy(document, ms))
                    {
                        document.Open();
                        for (int i = 0; i < pdfs.Count; ++i)
                        {
                            using (PdfReader reader = new PdfReader(
                                new RandomAccessFileOrArray(pdfs[i]), null))
                            {
                                copy.AddDocument(reader);
                                copy.FreeReader(reader);
                            }
                        }
                    }
                }
                return ms.ToArray();
            }
        }
    
        public bool IsReusable { get { return false; } }
    }
    

    尝试使输出文件与您在问题中描述的类似,但是YMMV,取决于您处理的各个PDF的大小。这是我跑步的测试结果:

    enter image description here

答案 1 :(得分:0)

所以在经历了很多乱七八糟的事情之后,我意识到它无处可去。但是,我确实设法找到了解决办法。我没有返回字节数组,而是返回一个临时文件路径,然后我将其传输并删除。

    private string MergeLotsOfPDFs(List<byte[]> PDFs)
    {
        Document doc = new Document();
        Guid uniqueId = Guid.NewGuid();
        string tempFileName = Server.MapPath("~/__" + uniqueId.ToString() + ".pdf");

        using (FileStream ms = new FileStream(tempFileName, FileMode.Create, FileAccess.Write, FileShare.Read))
        {
            PdfCopy copy = new PdfCopy(doc, ms);
            doc.Open();

            int i = 0;
            foreach (byte[] PDF in PDFs)
            {
                i++;
                // Create a reader
                PdfReader reader = new PdfReader(new RandomAccessFileOrArray(PDF), null);

                // Cycle through all the pages
                for (int currentPageNumber = 1; currentPageNumber <= reader.NumberOfPages; ++currentPageNumber)
                {
                    // Read a page
                    PdfImportedPage curPg = copy.GetImportedPage(reader, currentPageNumber);

                    // Add the page over to the rest of them
                    copy.AddPage(curPg);

                    // This is a lie, it still costs money, hue hue hue :)~
                    copy.FreeReader(reader);
                }
                reader.Close();
            }

            // Close the document
            doc.Close();

            // Close the document
            copy.Close();
        }

        // Return temp file path
        return tempFileName;
    }

以下是我将该数据发送给客户的方式。

        // Send the merged PDF file to the user.
        System.Web.HttpResponse response = System.Web.HttpContext.Current.Response;
        response.ClearContent();
        Response.ClearHeaders();
        response.ContentType = "application/pdf";
        response.AddHeader("Content-Disposition", "attachment; filename=1094C.pdf;");
        response.WriteFile(tempFileName);
        HttpContext.Current.Response.Flush(); // Sends all currently buffered output to the client.
        DeleteFile(tempFileName); // Call right after flush but before close
        HttpContext.Current.Response.SuppressContent = true;  // Gets or sets a value indicating whether to send HTTP content to the client.
        HttpContext.Current.ApplicationInstance.CompleteRequest(); // Causes ASP.NET to bypass all events and filtering in the HTTP pipeline chain of execution and directly execute the EndRequest event.

最后,这是一个奇特的DeleteFile方法

    private void DeleteFile(string fileName)
    {
        if (File.Exists(fileName))
        {
            try
            {
                File.Delete(fileName);
            }
            catch (Exception ex)
            {
                //Could not delete the file, wait and try again
                try
                {
                    System.GC.Collect();
                    System.GC.WaitForPendingFinalizers();
                    File.Delete(fileName);
                }
                catch
                {
                    //Could not delete the file still
                }
            }
        }
    }