无法使用Java中的Aspose PDF阅读PDF文件中的内容

时间:2018-05-04 21:06:07

标签: java aspose aspose.pdf

我正在尝试使用AsposePdf在PDF文件中搜索字符串。

这就是我正在做的事情:

String path = "C:/Windows/Fonts";
List list = Document.getLocalFontPaths();
list.add(path);
Document.setLocalFontPaths(list);
Document pdfDocument = new Document("myFile.pdf");
PageCollection pages = pdfDocument.getPages();
TextAbsorber textAbsorber = new TextAbsorber
  (new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Raw));  

for(int i = 1; i <= pages.size(); i++){
    Page currentPage = pdfDocument.getPages().get_Item(i);
    currentPage.accept(textAbsorber);
    String abText = textAbsorber.getText();
    String[] abArray = abText.trim().split("\n");
    for (String txtArray : abArray) {
         if (txtArray.contains("SomeText")) {
                //do something
              }
        }
 }

NullPointerException:currentPage.accept(textAbsorber);

错误堆栈跟踪:

java.lang.NullPointerException
    at com.aspose.pdf.internal.p51.z11.m2(Unknown Source)
    at com.aspose.pdf.internal.p51.z11.m7(Unknown Source)
    at com.aspose.pdf.internal.p51.z13.m1(Unknown Source)
    at com.aspose.pdf.internal.p51.z13.m1(Unknown Source)
    at com.aspose.pdf.internal.p51.z13.m6(Unknown Source)
    at com.aspose.pdf.internal.p51.z13.<init>(Unknown Source)
    at com.aspose.pdf.internal.p51.z13.<init>(Unknown Source)
    at com.aspose.pdf.TextAbsorber.visit(Unknown Source)
    at com.aspose.pdf.Page.accept(Unknown Source)

可能是什么原因?

1 个答案:

答案 0 :(得分:0)

您无需从PDF文件中拆分或修剪字符串以提取任何文本。 Aspose.PDF API支持有效地提取文本。请尝试使用以下代码段从PDF文档中提取文本。

// Open document
Document pdfDocument = new Document("input.pdf");

// Create TextAbsorber object to find all instances of the input search phrase
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber("SEARCH STRING");

// Accept the absorber for first page of document
pdfDocument.getPages().accept(textFragmentAbsorber);

// Get the extracted text fragments into collection
TextFragmentCollection textFragmentCollection = textFragmentAbsorber.getTextFragments();

// Loop through the Text fragments
for (TextFragment textFragment : (Iterable<TextFragment>) textFragmentCollection) {
    // Iterate through text segments
    for (TextSegment textSegment : (Iterable<TextSegment>) textFragment.getSegments()) {
        System.out.println("Text :- " + textSegment.getText());
    }
}

有关文本提取的详细信息,请访问Search and Get Text from Pages of a PDF Document。如果您遇到任何问题,请与我们分享源PDF文件,同时提及您要提取的文本。

PS: 我使用Aspose作为开发者布道者。