如何合并巨大的PDF文件而不用itext 7将它们完全加载到内存中?

时间:2017-01-03 06:05:11

标签: c# itext itext7

我尝试合并两个大的PDF文件而不将它们完全加载到内存中。

我尝试使用PdfMerger,并且在没有PdfMerger的情况下尝试使用这种代码:

using(var writer = new PdfWriter(new FileStream(@"C:\Test\OutBig.pdf",FileMode.OpenOrCreate)))
    using (var outputDocument = new PdfDocument(writer)) {
        using (var inputDoc = new PdfDocument(new PdfReader((@"C:\Test\InBig.pdf")))) {
            for (int i = 1; i <= inputDoc.GetNumberOfPages(); i++) {
                var newp = outputDocument.AddNewPage();
                var canvas = new PdfCanvas(newp);
                var origPage = inputDoc.GetPage(i);
                var copy = origPage.CopyAsFormXObject(outputDocument);
                canvas.AddXObject(copy, 0, 0);
                copy.Flush();
                origPage = null;
                canvas.Release();
                newp.Flush();
                writer.Flush();
                canvas = null;
                newp = null;
            }
        }

代码正在运行,但每个页面都加载到内存中并保持加载状态,因此我在内存中加载了超过1GB的内容。

你知道如何合并2个pdfs文件而不用itext7将它们加载到内存中吗?

此致

帕特里斯

2 个答案:

答案 0 :(得分:1)

使用PdfDocument doc1 = new PdfDocument(new PdfReader(IN1)); int numOfPages = doc1.getNumberOfPages(); doc1.close(); PdfDocument outDoc = new PdfDocument(new PdfWriter(OUT)); int numOfPagesPerDocumentOpen = 10; for (int i = 1; i <= numOfPages; ) { int firstPageToCopy = i; int lastPageToCopy = Math.min(i + numOfPagesPerDocumentOpen - 1, numOfPages); doc1 = new PdfDocument(new PdfReader(IN1)); doc1.copyPagesTo(firstPageToCopy, lastPageToCopy, outDoc); // Flush last lastPageToCopy - firstPageToCopy + 1 pages for (int j = 0; j <= lastPageToCopy - firstPageToCopy; j++) { outDoc.getPage(outDoc.getNumberOfPages() - j).flush(true); } doc1.close(); i = lastPageToCopy + 1; } outDoc.close(); 复制大型文档时,有几种方法可以降低内存消耗。其中之一是利用按需读取对象的事实。因此,您实际上可以通过多次打开和关闭源文档,将页面从源文档复制到目标文档。

以下是Java中的代码,它几乎只能通过将方法名称替换为大写来转换为C#。

    The parameter baudrate can be one of the standard values: 50, 75, 110, 
134, 150, 200, 300, 600, 1200, 1800, 2400, 4800, 9600, 19200, 38400, 57600,
115200. These are well supported on all platforms.

    Standard values above 115200, such as: 230400, 460800, 500000, 576000,
921600, 1000000, 1152000, 1500000, 2000000, 2500000, 3000000, 3500000, 4000000
 also work on many platforms and devices.

    Non-standard values are also supported on some platforms (GNU/Linux, MAC 
OSX >= Tiger, Windows). Though, even on these platforms some serial ports may 
reject non-standard values.

答案 1 :(得分:0)

我现在已经尝试了一些组件(AsposeITextSharpTelerik)并且 Telerik 似乎已经破解了它。

我关注了these steps,但内存仍然很低。

示例代码

var files = Directory.GetFiles(bundlePath);

using (PdfStreamWriter fileWriter = new PdfStreamWriter(File.OpenWrite(outputFile)))
{
    // Iterate through the files you would like to merge
    foreach (string documentName in files)
    {
        // Open each of the files
        using (PdfFileSource fileToMerge = new PdfFileSource(File.OpenRead(documentName)))
        {
            // Iterate through the pages of the current document
            foreach (PdfPageSource pageToMerge in fileToMerge.Pages)
            {
                // Append the current page to the fileWriter, which holds the result FileStream
                fileWriter.WritePage(pageToMerge);
            }
        }
    }
}

ITextSharp enter image description here

假设 enter image description here

电信 enter image description here