我基本上想使用Apache pdfbox加载pdf,并将其转换为每一页的base64列表。
我尝试了以下代码,但是它非常慢。我不需要转换为图像,我只想转换为base64即可传递给前端
PDDocument document = PDDocument.loadNonSeq(new File("Random.pdf"), null);
@SuppressWarnings("unchecked")
List<PDPage> pdPages = document.getDocumentCatalog().getAllPages();
int page = 0;
List<String> base64DocumentPages = new ArrayList<>();
for (PDPage pdPage : pdPages)
{
++page;
BufferedImage img = pdPage.convertToImage(BufferedImage.TYPE_INT_RGB, 300); // this is slow
ByteArrayOutputStream os = new ByteArrayOutputStream();
ImageIOUtil.writeImage(img, ".png", os);
String base64Page = Base64.getEncoder().encodeToString(os.toByteArray());
base64DocumentPages.add(URLEncoder.encode(base64Page, "UTF-8"));
}
document.close();
我正在使用PDFBOX来循环浏览页面,但是如果您了解得更多,我可以使用任何东西。
PS:我真的需要用某种数组分隔页面的Base64数据
答案 0 :(得分:0)
您确定它的convertToImage
方法吗?在我们的例子中,writeIamge
方法需要最长的时间。
问题是您使用标准的PNGWriter。这有一个缺陷/错误,它总是在时间上花费最好的压缩。对于Java 9,该问题已修复,但在此之前,有backported版可用。那你需要做什么?
(1)将以下依赖项添加到您的maven projet中(如果不使用maven,则手动添加)
<dependency>
<groupId>net.gredler</groupId>
<artifactId>jdk9-png-writer-backport</artifactId>
<version>1.0.0</version>
</dependency>
(2)确保使用PdfBox 2.X
(3)更改您的转换代码:
private static void convertMethod2(File pdf) {
try (final PDDocument document = PDDocument.load(pdf)) {
PDFRenderer pdfRenderer = new PDFRenderer(document);
List<String> base64DocumentPages = new ArrayList<>();
for (int page = 0; page < document.getNumberOfPages(); ++page) {
BufferedImage bim = pdfRenderer.renderImageWithDPI(page, 150, ImageType.RGB);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
PNGImageWriterBackport writer = chosePngWriter();
if(writer!=null) {
try (ImageOutputStream stream = new MemoryCacheImageOutputStream(baos)) {
writer.setOutput(stream);
writer.write(null,new IIOImage(bim, null, null), getImageParams(writer));
}
finally {
writer.dispose();
}
}
else {
System.err.println("PNGImageWriterBackport not found! Aborting");
}
String base64Page = Base64.getEncoder().encodeToString(os.toByteArray());
base64DocumentPages.add(URLEncoder.encode(base64Page, "UTF-8"));
}
document.close();
}
catch (IOException e) {
//handle exception
}
}
private static PNGImageWriterBackport chosePngWriter() {
Iterator<ImageWriter> imageWriters = ImageIO.getImageWritersByFormatName("png");
ImageWriter writer = null;
while(imageWriters.hasNext()) {
writer = imageWriters.next();
if (writer instanceof PNGImageWriterBackport) {
return (PNGImageWriterBackport)writer;
}
}
return null;
}
private static ImageWriteParam getImageParams(PNGImageWriterBackport writer) {
ImageWriteParam writeParam = writer.getDefaultWriteParam();
//set compression mode which wasn't possible before
writeParam.setCompressionMode(ImageWriteParam.MODE_EXPLICIT);
//0.0f highest compression, slowest
//1.0f lowest compression, fastest
writeParam.setCompressionQuality(0.9f);
return writeParam;
}
(4)当然可以将DPI降低到例如150也可以加快这一过程。但是我知道这并不总是可能的...
以最小的png文件大小为代价,这会更快很多...