使用pdfbox在特定页面中打开PDF文件

时间:2014-01-31 18:07:46

标签: c# java pdfbox

我有这个程序在文件夹的所有pdf文件中进行搜索,例如句子。 它工作得很完美......

但我想在该句的确切页面中添加一个功能。 我查看了pdfbox的文档,但我找不到任何特定的内容。

我不知道我是否让某些东西过去了,但是如果有人能够启发我,我将非常感激

谢谢

1 个答案:

答案 0 :(得分:2)

本周早些时候我读了你的问题。当时,我没有给你答案。然后我偶然发现了PDFTextStripper类的PDFBox文档中的方法setStartPage()和setEndPage(),这让我想起了你的问题和这个答案。你问这个问题已经有4个月了,但也许这对某些人有帮助。我知道在写作的过程中我学到了一两件事。

搜索PDF文件时,您可以搜索一系列页面。函数setStartPage()和setEndPage()设置您正在搜索的页面范围。如果我们将开始和结束页面设置为相同的页码,那么我们将知道找到搜索词的页面。

在下面的代码中,我使用的是Windows窗体应用程序,但您可以调整我的代码以适合您的应用程序。

using System;
using System.Windows.Forms;
using org.apache.pdfbox.pdmodel;
using org.apache.pdfbox.util;
//The Diagnostics namespace is needed to specify PDF open parameters. More on them later.
using System.Diagnostics;
//specify the string you are searching for
string searchTerm = "golden";
//I am using a static file path
string pdfFilePath = @"F:\myFile.pdf";
//load the document
PDDocument document = PDDocument.load(pdfFilePath);
//get the number of pages
int numberOfPages = document.getNumberOfPages();
//create an instance of text stripper to get text from pdf document
PDFTextStripper stripper = new PDFTextStripper();
//loop through all the pages. We will search page by page
for (int pageNumber = 1; pageNumber <= numberOfPages; pageNumber++)
{
    //set the start page
    stripper.setStartPage(pageNumber);
    //set the end page
    stripper.setEndPage(pageNumber);
    //get the text from the page range we set above.
    //in this case we are searching one page.
    //I used the ToLower method to make all the text lowercase
    string pdfText = stripper.getText(document).ToLower();
    //just for fun, display the text on each page in a messagebox. My pdf file only has two pages. But this might be annoying to you if you have more.
    MessageBox.Show(pdfText);
    //search the pdfText for the search term
    if (pdfText.Contains(searchTerm))
    {
        //just for fun, display the page number on which we found the search term
        MessageBox.Show("Found the search term on page " + pageNumber);
        //create a process. We will be opening the pdf document to a specific page number
        Process myProcess = new Process();
        //I specified Adobe Acrobat as the program to open
        myProcess.StartInfo.FileName = "Acrobat.exe";
        //see link below for info on PDF document open parameters
        myProcess.StartInfo.Arguments = "/A \"page=" + pageNumber + "=OpenActions\"" + pdfFilePath;
        //Start the process
        myProcess.Start();
        //break out of the loop. we found our search term and we opened the PDF file
        break;
    }
}
//close the document we opened.
document.close();

查看有关设置PDF文件开放参数的Adobe pdf文档: http://partners.adobe.com/public/developer/en/acrobat/PDFOpenParameters.pdf