pdfbox编写压缩对象流

时间:2017-02-16 20:17:54

标签: java pdfbox

我合并了多个文件,最初有19mb。

但结果是总共56mb。我怎样才能使这个最终值接近19mb。 [编辑]

public void concatena(InputStream anterior, InputStream novo, OutputStream saida, List<String> marcadores)
    throws IOException {
    PDFMergerUtility pdfMerger = new PDFMergerUtility();
    pdfMerger.setDestinationStream(saida);
    PDDocument dest;
    PDDocument src;
    MemoryUsageSetting setupMainMemoryOnly = MemoryUsageSetting.setupMainMemoryOnly();
    if (anterior != null) {                     
        dest = PDDocument.load(anterior, setupMainMemoryOnly);
        src = PDDocument.load(novo, setupMainMemoryOnly);
    } else {
        dest = PDDocument.load(novo, setupMainMemoryOnly);
        src = new PDDocument();
    }       
    int totalPages = dest.getNumberOfPages();   
    pdfMerger.appendDocument(dest, src);
    criaMarcador(dest, totalPages, marcadores);
    saida = pdfMerger.getDestinationStream();
    dest.save(saida);
    dest.close();
    src.close();
}

抱歉,我仍然不知道如何使用stackoverflow。我试图发布剩下的代码,但我收到了错误

[编辑2 - 添加criaMarcador方法]

private void criaMarcador(PDDocument src, int numPaginas, List<String> marcadores) {
    if (marcadores != null && !marcadores.isEmpty()) {
        PDDocumentOutline documentOutline = src.getDocumentCatalog().getDocumentOutline();          
        if (documentOutline == null) {
            documentOutline = new PDDocumentOutline();
        }
        PDPage page;
        if (src.getNumberOfPages() == numPaginas) {
            page = src.getPage(0);
        } else {
            page = src.getPage(numPaginas);
        }
        PDOutlineItem bookmark = null;
        PDOutlineItem pai = null;
        String etiquetaAnterior = null;
        for (String etiqueta : marcadores) {                
            bookmark = bookmark(pai != null ? pai : documentOutline, etiqueta);
            if (bookmark == null) {
                if (etiquetaAnterior != null && !etiquetaAnterior.equals(etiqueta) && pai == null) {
                    pai = bookmark(documentOutline, etiquetaAnterior);
                }
                bookmark = new PDOutlineItem();
                bookmark.setTitle(etiqueta);
                if (marcadores.indexOf(etiqueta) == marcadores.size() - 1) {
                    bookmark.setDestination(page);
                }
                if (pai != null) {
                    pai.addLast(bookmark);
                    pai.openNode();
                } else {
                    documentOutline.addLast(bookmark);
                }
            } else {
                pai = bookmark;
            }
            etiquetaAnterior = etiqueta;
        }   
        src.getDocumentCatalog().setDocumentOutline(documentOutline);           
    }       
}

private PDOutlineItem bookmark(PDOutlineNode outline, String etiqueta) {             
    PDOutlineItem current = outline.getFirstChild();
    while (current != null) {
        if (current.getTitle().equals(etiqueta)) {
            return current;
        }
        bookmark(current, etiqueta);
        current = current.getNextSibling();
    }
    return current;
}

[编辑3]以下是用于测试的代码

public class PDFMergeTeste {


public static void main(String[] args) throws IOException {
    if (args.length == 1) {
        PDFMergeTeste teste = new PDFMergeTeste();
        teste.executa(args[0]);
    } else {
        System.err.println("Argumento tem que ser diretorio contendo arquivos .pdf com nomeclatura no padrão Autos");
    }
}

private void executa(String diretorioArquivos) throws IOException {
    File[] listFiles = new File(diretorioArquivos).listFiles((pathname) -> 
            pathname.getName().endsWith(".pdf") || pathname.getName().endsWith(".PDF"));
    List<File> lista = Arrays.asList(listFiles);
    lista.sort(Comparator.comparing(File::lastModified));
    PDFMerge merge = new PDFMerge();
    InputStream anterior = null;
    ByteArrayOutputStream saida = new ByteArrayOutputStream();
    for (File file : lista) {
        List<String> marcadores = marcadores(file.getName());           
        InputStream novo = new FileInputStream(file);           
        merge.concatena(anterior, novo, saida, marcadores);                     
        anterior = new ByteArrayInputStream(saida.toByteArray());
    }
    try (OutputStream pdf = new FileOutputStream(pathDestFile)) {
        saida.writeTo(pdf);
    }


}
private List<String> marcadores(String name) {
    String semExtensao = name.substring(0, name.indexOf(".pdf"));
    return Arrays.asList(semExtensao.split("_"));       
}

}

1 个答案:

答案 0 :(得分:1)

错误在executa方法中:

InputStream anterior = null;
ByteArrayOutputStream saida = new ByteArrayOutputStream();
for (File file : lista) {
    List<String> marcadores = marcadores(file.getName());           
    InputStream novo = new FileInputStream(file);           
    merge.concatena(anterior, novo, saida, marcadores);                     
    anterior = new ByteArrayInputStream(saida.toByteArray());
}

您的ByteArrayOutputStream saida会在每个循环中重复使用,但不会在中间清除。因此,它包含

  • 处理文件1后:
    • file 1
  • 处理文件2后:
    • file 1
    • 文件1和文件2的连接
  • 处理文件3后:文件1
    • file 1
    • 文件1和文件2的连接
    • 文件1和文件2以及文件3的连接
  • 处理文件4后:
    • file 1
    • 文件1和文件2的连接
    • 文件1和文件2以及文件3的连接
    • 连接文件1和文件2以及文件3和文件4

(实际上这只有效,因为PDFBox试图很好并修复了损坏的输入文件,因为严格来说这些文件的连接都被破坏了,而PDFBox并不需要能够解析它们。)

您可以通过在每次迭代开始时清除saida来解决此问题:

InputStream anterior = null;
ByteArrayOutputStream saida = new ByteArrayOutputStream();
for (File file : lista) {
    saida.reset();
    List<String> marcadores = marcadores(file.getName());           
    InputStream novo = new FileInputStream(file);           
    merge.concatena(anterior, novo, saida, marcadores);                     
    anterior = new ByteArrayInputStream(saida.toByteArray());
}

使用原始方法,输入的结果大小接近26 MB,固定方法大约为5 MB,后者大小约为输入文件大小的总和。