我有这个程序在文件夹的所有pdf文件中进行搜索,例如句子。 它工作得很完美......
但我想在该句的确切页面中添加一个功能。 我查看了pdfbox的文档,但我找不到任何特定的内容。
我不知道我是否让某些东西过去了,但是如果有人能够启发我,我将非常感激
谢谢
答案 0 :(得分:2)
本周早些时候我读了你的问题。当时,我没有给你答案。然后我偶然发现了PDFTextStripper类的PDFBox文档中的方法setStartPage()和setEndPage(),这让我想起了你的问题和这个答案。你问这个问题已经有4个月了,但也许这对某些人有帮助。我知道在写作的过程中我学到了一两件事。
搜索PDF文件时,您可以搜索一系列页面。函数setStartPage()和setEndPage()设置您正在搜索的页面范围。如果我们将开始和结束页面设置为相同的页码,那么我们将知道找到搜索词的页面。
在下面的代码中,我使用的是Windows窗体应用程序,但您可以调整我的代码以适合您的应用程序。
using System;
using System.Windows.Forms;
using org.apache.pdfbox.pdmodel;
using org.apache.pdfbox.util;
//The Diagnostics namespace is needed to specify PDF open parameters. More on them later.
using System.Diagnostics;
//specify the string you are searching for
string searchTerm = "golden";
//I am using a static file path
string pdfFilePath = @"F:\myFile.pdf";
//load the document
PDDocument document = PDDocument.load(pdfFilePath);
//get the number of pages
int numberOfPages = document.getNumberOfPages();
//create an instance of text stripper to get text from pdf document
PDFTextStripper stripper = new PDFTextStripper();
//loop through all the pages. We will search page by page
for (int pageNumber = 1; pageNumber <= numberOfPages; pageNumber++)
{
//set the start page
stripper.setStartPage(pageNumber);
//set the end page
stripper.setEndPage(pageNumber);
//get the text from the page range we set above.
//in this case we are searching one page.
//I used the ToLower method to make all the text lowercase
string pdfText = stripper.getText(document).ToLower();
//just for fun, display the text on each page in a messagebox. My pdf file only has two pages. But this might be annoying to you if you have more.
MessageBox.Show(pdfText);
//search the pdfText for the search term
if (pdfText.Contains(searchTerm))
{
//just for fun, display the page number on which we found the search term
MessageBox.Show("Found the search term on page " + pageNumber);
//create a process. We will be opening the pdf document to a specific page number
Process myProcess = new Process();
//I specified Adobe Acrobat as the program to open
myProcess.StartInfo.FileName = "Acrobat.exe";
//see link below for info on PDF document open parameters
myProcess.StartInfo.Arguments = "/A \"page=" + pageNumber + "=OpenActions\"" + pdfFilePath;
//Start the process
myProcess.Start();
//break out of the loop. we found our search term and we opened the PDF file
break;
}
}
//close the document we opened.
document.close();
查看有关设置PDF文件开放参数的Adobe pdf文档: http://partners.adobe.com/public/developer/en/acrobat/PDFOpenParameters.pdf