我正在尝试从某个包含pdf文件的网址中提取文本但我收到的错误是这样的 - 信息:文档已加密 2015年5月27日上午9:27:50 org.apache.pdfbox.filter.FlateFilter decode
public void getTextFromPdf(String urlS) throws IOException {
driver.get(urlS);
driver.manage().timeouts().implicitlyWait(10, TimeUnit.SECONDS);
URL url = new URL(driver.getCurrentUrl());
BufferedInputStream fileToParse = new BufferedInputStream(url.openStream());
//parse() -- This will parse the stream and populate the COSDocument object.
//COSDocument object -- This is the in-memory representation of the PDF document
PDFParser parser = new PDFParser(fileToParse);
parser.parse();
//getPDDocument() -- This will get the PD document that was parsed. When you are done with this document you must call close() on it to release resources
//PDFTextStripper() -- This class will take a pdf document and strip out all of the text and ignore the formatting and such.
System.out.println(urlS);
String output = new PDFTextStripper().getText(parser.getPDDocument());
System.out.println(output);
parser.getPDDocument().close();
driver.manage().timeouts().implicitlyWait(100, TimeUnit.SECONDS);