我想从PDF中提取图片,例如通过扫描仪生成的pdf文件。我正在使用pdfbox,从PDF中提取所有图像的正确方法是什么。我在做如下:
public static Map<String, BufferedImage> extractImagesFromPDF1(InputStream inputStream){
Map<String, BufferedImage> sheetMap = new HashMap<>();
int pageNo = 0;
try {
PDDocument document=PDDocument.load(inputStream);
PDFRenderer renderer = new PDFRenderer(document);
List<PDPage> pages = (List) document.getDocumentCatalog().getPages();
for (int i=0; i<document.getNumberOfPages(); i++){
BufferedImage bi = renderer.renderImage(i);
pageNo++;
ImageIO.write(bi, "jpg", new File("/home/parveenparmar/Documents/imgs/img_1_" + pageNo + ".jpg"));
sheetMap.put(String.valueOf(pageNo), bi);
}
document.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return sheetMap;
}