我想编写一个程序来提取论文中的主题,作者,摘要和其他信息。可以这样做吗?我该怎么办?
答案 0 :(得分:0)
假设您已将pdfbox jar添加到项目中,下面是您检索PDF的一些基本文档属性的代码
import java.io.File;
import java.io.IOException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDDocumentInformation;
public class readPdf {
public static void main(String args[]) throws IOException {
//Loading an existing document
File file = new File("C:/Users/user1/Desktop/test.pdf");
PDDocument document = PDDocument.load(file);
//Getting the PDDocumentInformation object
PDDocumentInformation pdd = document.getDocumentInformation();
//Retrieving the info of a PDF document
System.out.println("Author of the document is :"+ pdd.getAuthor());
System.out.println("Title of the document is :"+ pdd.getTitle());
System.out.println("Subject of the document is :"+ pdd.getSubject());
System.out.println("Creator of the document is :"+ pdd.getCreator());
System.out.println("Creation date of the document is :"+ pdd.getCreationDate());
System.out.println("Modification date of the document is :"+
pdd.getModificationDate());
System.out.println("Keywords of the document are :"+ pdd.getKeywords());
//Closing the document
document.close();
}
}
有关更多文档属性,请参阅here。 HTH。