我有一个pdf文档,其中包含我用c#以编程方式填写的表单字段。根据三个条件,我需要修剪(删除)该文档中的一些页面。
这可能吗?
条件1:我需要保留第1-4页但删除第5和第6页
条件2:我需要保留第1-4页但删除5并保留6
条件3:我需要保留第1-5页但删除6
答案 0 :(得分:18)
将PdfReader.SelectPages()与PdfStamper结合使用。以下代码使用iTextSharp 5.5.1。
public void SelectPages(string inputPdf, string pageSelection, string outputPdf)
{
using (PdfReader reader = new PdfReader(inputPdf))
{
reader.SelectPages(pageSelection);
using (PdfStamper stamper = new PdfStamper(reader, File.Create(outputPdf)))
{
stamper.Close();
}
}
}
然后使用针对每个条件的正确页面选择调用此方法。
条件1:
SelectPages(inputPdf, "1-4", outputPdf);
条件2:
SelectPages(inputPdf, "1-4,6", outputPdf);
或
SelectPages(inputPdf, "1-6,!5", outputPdf);
条件3:
SelectPages(inputPdf, "1-5", outputPdf);
这里是iTextSharp源代码中有关构成页面选择的内容的注释。这是在SequenceList类中,用于处理页面选择:
/**
* This class expands a string into a list of numbers. The main use is to select a
* range of pages.
* <p>
* The general systax is:<br>
* [!][o][odd][e][even]start-end
* <p>
* You can have multiple ranges separated by commas ','. The '!' modifier removes the
* range from what is already selected. The range changes are incremental, that is,
* numbers are added or deleted as the range appears. The start or the end, but not both, can be ommited.
*/
答案 1 :(得分:6)
不是删除文档中的页面而是实际执行的操作是创建新文档,而只导入要保留的页面。下面是一个完整的WinForms应用程序(目标iTextSharp 5.1.1.0)。函数removePagesFromPdf
的最后一个参数是要保留的页面数组。
下面的代码使用物理文件,但很容易转换为基于流的内容,这样就不必在不想写入磁盘的情况下写入。
using System;
using System.ComponentModel;
using System.IO;
using System.Linq;
using System.Windows.Forms;
using iTextSharp.text.pdf;
using iTextSharp.text;
namespace Full_Profile1
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e)
{
//The files that we are working with
string sourceFolder = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
string sourceFile = Path.Combine(sourceFolder, "Test.pdf");
string destFile = Path.Combine(sourceFolder, "TestOutput.pdf");
//Remove all pages except 1,2,3,4 and 6
removePagesFromPdf(sourceFile, destFile, 1, 2, 3, 4, 6);
this.Close();
}
public void removePagesFromPdf(String sourceFile, String destinationFile, params int[] pagesToKeep)
{
//Used to pull individual pages from our source
PdfReader r = new PdfReader(sourceFile);
//Create our destination file
using (FileStream fs = new FileStream(destinationFile, FileMode.Create, FileAccess.Write, FileShare.None))
{
using (Document doc = new Document())
{
using (PdfWriter w = PdfWriter.GetInstance(doc, fs))
{
//Open the desitination for writing
doc.Open();
//Loop through each page that we want to keep
foreach (int page in pagesToKeep)
{
//Add a new blank page to destination document
doc.NewPage();
//Extract the given page from our reader and add it directly to the destination PDF
w.DirectContent.AddTemplate(w.GetImportedPage(r, page), 0, 0);
}
//Close our document
doc.Close();
}
}
}
}
}
}
答案 2 :(得分:3)
以下是我用来复制除现有PDF的最后一页之外的所有代码的代码。一切都在记忆流中。变量 pdfByteArray 是使用ms.ToArray()获得的原始pdf的byte []。 pdfByteArray 会被新PDF覆盖。
PdfReader originalPDFReader = new PdfReader(pdfByteArray);
using (MemoryStream msCopy = new MemoryStream())
{
using (Document docCopy = new Document())
{
using (PdfCopy copy = new PdfCopy(docCopy, msCopy))
{
docCopy.Open();
for (int pageNum = 1; pageNum <= originalPDFReader.NumberOfPages - 1; pageNum ++)
{
copy.AddPage(copy.GetImportedPage(originalPDFReader, pageNum ));
}
docCopy.Close();
}
}
pdfByteArray = msCopy.ToArray();