itextsharp修剪pdf文档的页面

时间:2011-08-30 15:59:34

标签: c# .net itextsharp

我有一个pdf文档,其中包含我用c#以编程方式填写的表单字段。根据三个条件,我需要修剪(删除)该文档中的一些页面。

这可能吗?

条件1:我需要保留第1-4页但删除第5和第6页

条件2:我需要保留第1-4页但删除5并保留6

条件3:我需要保留第1-5页但删除6

3 个答案:

答案 0 :(得分:18)

将PdfReader.SelectPages()与PdfStamper结合使用。以下代码使用iTextSharp 5.5.1。

public void SelectPages(string inputPdf, string pageSelection, string outputPdf)
{
    using (PdfReader reader = new PdfReader(inputPdf))
    {
        reader.SelectPages(pageSelection);

        using (PdfStamper stamper = new PdfStamper(reader, File.Create(outputPdf)))
        {
            stamper.Close();
        }
    }
}

然后使用针对每个条件的正确页面选择调用此方法。

条件1:

SelectPages(inputPdf, "1-4", outputPdf);

条件2:

SelectPages(inputPdf, "1-4,6", outputPdf);

SelectPages(inputPdf, "1-6,!5", outputPdf);

条件3:

SelectPages(inputPdf, "1-5", outputPdf);

这里是iTextSharp源代码中有关构成页面选择的内容的注释。这是在SequenceList类中,用于处理页面选择:

/**
* This class expands a string into a list of numbers. The main use is to select a
* range of pages.
* <p>
* The general systax is:<br>
* [!][o][odd][e][even]start-end
* <p>
* You can have multiple ranges separated by commas ','. The '!' modifier removes the
* range from what is already selected. The range changes are incremental, that is,
* numbers are added or deleted as the range appears. The start or the end, but not both, can be ommited.
*/

答案 1 :(得分:6)

不是删除文档中的页面而是实际执行的操作是创建新文档,而只导入要保留的页面。下面是一个完整的WinForms应用程序(目标iTextSharp 5.1.1.0)。函数removePagesFromPdf的最后一个参数是要保留的页面数组。

下面的代码使用物理文件,但很容易转换为基于流的内容,这样就不必在不想写入磁盘的情况下写入。

using System;
using System.ComponentModel;
using System.IO;
using System.Linq;
using System.Windows.Forms;
using iTextSharp.text.pdf;
using iTextSharp.text;


namespace Full_Profile1
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }

        private void Form1_Load(object sender, EventArgs e)
        {
            //The files that we are working with
            string sourceFolder = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
            string sourceFile = Path.Combine(sourceFolder, "Test.pdf");
            string destFile = Path.Combine(sourceFolder, "TestOutput.pdf");

            //Remove all pages except 1,2,3,4 and 6
            removePagesFromPdf(sourceFile, destFile, 1, 2, 3, 4, 6);
            this.Close();
        }
        public void removePagesFromPdf(String sourceFile, String destinationFile, params int[] pagesToKeep)
        {
            //Used to pull individual pages from our source
            PdfReader r = new PdfReader(sourceFile);
            //Create our destination file
            using (FileStream fs = new FileStream(destinationFile, FileMode.Create, FileAccess.Write, FileShare.None))
            {
                using (Document doc = new Document())
                {
                    using (PdfWriter w = PdfWriter.GetInstance(doc, fs))
                    {
                        //Open the desitination for writing
                        doc.Open();
                        //Loop through each page that we want to keep
                        foreach (int page in pagesToKeep)
                        {
                            //Add a new blank page to destination document
                            doc.NewPage();
                            //Extract the given page from our reader and add it directly to the destination PDF
                            w.DirectContent.AddTemplate(w.GetImportedPage(r, page), 0, 0);
                        }
                        //Close our document
                        doc.Close();
                    }
                }
            }
        }
    }
}

答案 2 :(得分:3)

以下是我用来复制除现有PDF的最后一页之外的所有代码的代码。一切都在记忆流中。变量 pdfByteArray 是使用ms.ToArray()获得的原始pdf的byte []。 pdfByteArray 会被新PDF覆盖。

        PdfReader originalPDFReader = new PdfReader(pdfByteArray);

        using (MemoryStream msCopy = new MemoryStream())
        {
           using (Document docCopy = new Document())
           {
              using (PdfCopy copy = new PdfCopy(docCopy, msCopy))
              {
                 docCopy.Open();
                 for (int pageNum = 1; pageNum <= originalPDFReader.NumberOfPages - 1; pageNum ++)
                 {
                    copy.AddPage(copy.GetImportedPage(originalPDFReader, pageNum ));
                 }
                 docCopy.Close();
              }
           }

           pdfByteArray = msCopy.ToArray();