Question

我找到了一个How to break a PDF into parts教程，演示了如何使用Adobe Acrobat将PDF文件拆分为单独的PDF文件或最大文件大小：

Tools > Split Document > Split document by File size

我在StackOverflow上有关于如何使用C＃按页面拆分PDF的found many examples。但我怎么能做后者呢？如何使用C＃以最大文件大小将PDF文件拆分为多个PDF文件？

例如，假设我有一个70页和40 MB的PDF文件。而不是分成7个PDF文件，每个10页，如何使用C＃将文件分成大约5个不超过10 MB的PDF文件？

到目前为止，我见过的最好的方法是Using itextsharp to split a pdf into smaller pdf's based on size Cyfer13使用iTextSharp按页面拆分文件，然后按大小对这些页面文件进行分组。但是，这是一种更直接的方法来实现这一点，而不必先按页面分割？

Answer 1

从PDFsharp Sample: Split Document开始，我编写了以下SplitBySize方法：

public static void SplitBySize(string filename, long limit)
{
    PdfDocument input = PdfReader.Open(filename, PdfDocumentOpenMode.Import);
    PdfDocument output = CreateDocument(input);

    string name = Path.GetFileNameWithoutExtension(filename);
    string temp = string.Format("{0} - {1}.pdf", name, 0);
    int j = 1;
    for (int i = 0; i < input.PageCount; i++)
    {
        PdfPage page = input.Pages[i];
        output.AddPage(page);
        output.Save(temp);
        FileInfo info = new FileInfo(temp);
        if (info.Length <= limit)
        {
            string path = string.Format("{0} - {1}.pdf", name, j);
            if (File.Exists(path))
            {
                File.Delete(path);
            }
            File.Move(temp, path);
        }
        else
        {
            if (output.PageCount > 1)
            {
                output = CreateDocument(input);
                ++j;
                --i;
            }
            else
            {
                throw new Exception(
                    string.Format("Page #{0} is greater than the document size limit of {1} MB (size = {2})",
                    i + 1,
                    limit / 1E6,
                    info.Length));
            }
        }
    }
}

我会继续测试，但它到目前为止一直在工作。

Answer 2

这是一个未经测试的示例代码，假设您准备以纯二进制级别进行拆分，即PDF阅读器不会读取这些部分，您必须重新加入这些部分才能使其可读：

下面的代码首先在byte []数组中获取pdf文件。然后根据任意分区大小（本例中为5），获取每个part-binary文件的文件大小。然后，它将创建一个临时内存流并通过循环创建每个分区并写入新的.part文件。（您可能需要进行一些更改才能使其可行）。

        byte[] pdfBytes = File.ReadAllBytes("c:\foo.pdf");
        int fileSize = pdfBytes.Length / 5; //assuming foo is 40MB filesize will be abt 8MB
        MemoryStream m = new MemoryStream(pdfBytes);
        for (int i = 0; i < 4; i++)
        {
            byte[] tbytes = new byte[fileSize];
            m.Read(tbytes,i*fileSize,fileSize);
            File.WriteAllBytes("C:\foo" + i + ".part",tbytes);
        }

如何使用C＃按文件大小拆分PDF文件？

2 个答案: