我正在尝试使用AsposePdf在PDF文件中搜索字符串。
这就是我正在做的事情:
String path = "C:/Windows/Fonts";
List list = Document.getLocalFontPaths();
list.add(path);
Document.setLocalFontPaths(list);
Document pdfDocument = new Document("myFile.pdf");
PageCollection pages = pdfDocument.getPages();
TextAbsorber textAbsorber = new TextAbsorber
(new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Raw));
for(int i = 1; i <= pages.size(); i++){
Page currentPage = pdfDocument.getPages().get_Item(i);
currentPage.accept(textAbsorber);
String abText = textAbsorber.getText();
String[] abArray = abText.trim().split("\n");
for (String txtArray : abArray) {
if (txtArray.contains("SomeText")) {
//do something
}
}
}
NullPointerException:currentPage.accept(textAbsorber);
错误堆栈跟踪:
java.lang.NullPointerException
at com.aspose.pdf.internal.p51.z11.m2(Unknown Source)
at com.aspose.pdf.internal.p51.z11.m7(Unknown Source)
at com.aspose.pdf.internal.p51.z13.m1(Unknown Source)
at com.aspose.pdf.internal.p51.z13.m1(Unknown Source)
at com.aspose.pdf.internal.p51.z13.m6(Unknown Source)
at com.aspose.pdf.internal.p51.z13.<init>(Unknown Source)
at com.aspose.pdf.internal.p51.z13.<init>(Unknown Source)
at com.aspose.pdf.TextAbsorber.visit(Unknown Source)
at com.aspose.pdf.Page.accept(Unknown Source)
可能是什么原因?
答案 0 :(得分:0)
您无需从PDF文件中拆分或修剪字符串以提取任何文本。 Aspose.PDF API支持有效地提取文本。请尝试使用以下代码段从PDF文档中提取文本。
// Open document
Document pdfDocument = new Document("input.pdf");
// Create TextAbsorber object to find all instances of the input search phrase
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber("SEARCH STRING");
// Accept the absorber for first page of document
pdfDocument.getPages().accept(textFragmentAbsorber);
// Get the extracted text fragments into collection
TextFragmentCollection textFragmentCollection = textFragmentAbsorber.getTextFragments();
// Loop through the Text fragments
for (TextFragment textFragment : (Iterable<TextFragment>) textFragmentCollection) {
// Iterate through text segments
for (TextSegment textSegment : (Iterable<TextSegment>) textFragment.getSegments()) {
System.out.println("Text :- " + textSegment.getText());
}
}
有关文本提取的详细信息,请访问Search and Get Text from Pages of a PDF Document。如果您遇到任何问题,请与我们分享源PDF文件,同时提及您要提取的文本。
PS: 我使用Aspose作为开发者布道者。