我合并了多个文件,最初有19mb。
但结果是总共56mb。我怎样才能使这个最终值接近19mb。 [编辑]
public void concatena(InputStream anterior, InputStream novo, OutputStream saida, List<String> marcadores)
throws IOException {
PDFMergerUtility pdfMerger = new PDFMergerUtility();
pdfMerger.setDestinationStream(saida);
PDDocument dest;
PDDocument src;
MemoryUsageSetting setupMainMemoryOnly = MemoryUsageSetting.setupMainMemoryOnly();
if (anterior != null) {
dest = PDDocument.load(anterior, setupMainMemoryOnly);
src = PDDocument.load(novo, setupMainMemoryOnly);
} else {
dest = PDDocument.load(novo, setupMainMemoryOnly);
src = new PDDocument();
}
int totalPages = dest.getNumberOfPages();
pdfMerger.appendDocument(dest, src);
criaMarcador(dest, totalPages, marcadores);
saida = pdfMerger.getDestinationStream();
dest.save(saida);
dest.close();
src.close();
}
抱歉,我仍然不知道如何使用stackoverflow。我试图发布剩下的代码,但我收到了错误
[编辑2 - 添加criaMarcador方法]
private void criaMarcador(PDDocument src, int numPaginas, List<String> marcadores) {
if (marcadores != null && !marcadores.isEmpty()) {
PDDocumentOutline documentOutline = src.getDocumentCatalog().getDocumentOutline();
if (documentOutline == null) {
documentOutline = new PDDocumentOutline();
}
PDPage page;
if (src.getNumberOfPages() == numPaginas) {
page = src.getPage(0);
} else {
page = src.getPage(numPaginas);
}
PDOutlineItem bookmark = null;
PDOutlineItem pai = null;
String etiquetaAnterior = null;
for (String etiqueta : marcadores) {
bookmark = bookmark(pai != null ? pai : documentOutline, etiqueta);
if (bookmark == null) {
if (etiquetaAnterior != null && !etiquetaAnterior.equals(etiqueta) && pai == null) {
pai = bookmark(documentOutline, etiquetaAnterior);
}
bookmark = new PDOutlineItem();
bookmark.setTitle(etiqueta);
if (marcadores.indexOf(etiqueta) == marcadores.size() - 1) {
bookmark.setDestination(page);
}
if (pai != null) {
pai.addLast(bookmark);
pai.openNode();
} else {
documentOutline.addLast(bookmark);
}
} else {
pai = bookmark;
}
etiquetaAnterior = etiqueta;
}
src.getDocumentCatalog().setDocumentOutline(documentOutline);
}
}
private PDOutlineItem bookmark(PDOutlineNode outline, String etiqueta) {
PDOutlineItem current = outline.getFirstChild();
while (current != null) {
if (current.getTitle().equals(etiqueta)) {
return current;
}
bookmark(current, etiqueta);
current = current.getNextSibling();
}
return current;
}
[编辑3]以下是用于测试的代码
public class PDFMergeTeste {
public static void main(String[] args) throws IOException {
if (args.length == 1) {
PDFMergeTeste teste = new PDFMergeTeste();
teste.executa(args[0]);
} else {
System.err.println("Argumento tem que ser diretorio contendo arquivos .pdf com nomeclatura no padrão Autos");
}
}
private void executa(String diretorioArquivos) throws IOException {
File[] listFiles = new File(diretorioArquivos).listFiles((pathname) ->
pathname.getName().endsWith(".pdf") || pathname.getName().endsWith(".PDF"));
List<File> lista = Arrays.asList(listFiles);
lista.sort(Comparator.comparing(File::lastModified));
PDFMerge merge = new PDFMerge();
InputStream anterior = null;
ByteArrayOutputStream saida = new ByteArrayOutputStream();
for (File file : lista) {
List<String> marcadores = marcadores(file.getName());
InputStream novo = new FileInputStream(file);
merge.concatena(anterior, novo, saida, marcadores);
anterior = new ByteArrayInputStream(saida.toByteArray());
}
try (OutputStream pdf = new FileOutputStream(pathDestFile)) {
saida.writeTo(pdf);
}
}
private List<String> marcadores(String name) {
String semExtensao = name.substring(0, name.indexOf(".pdf"));
return Arrays.asList(semExtensao.split("_"));
}
}
答案 0 :(得分:1)
错误在executa
方法中:
InputStream anterior = null;
ByteArrayOutputStream saida = new ByteArrayOutputStream();
for (File file : lista) {
List<String> marcadores = marcadores(file.getName());
InputStream novo = new FileInputStream(file);
merge.concatena(anterior, novo, saida, marcadores);
anterior = new ByteArrayInputStream(saida.toByteArray());
}
您的ByteArrayOutputStream saida
会在每个循环中重复使用,但不会在中间清除。因此,它包含
(实际上这只有效,因为PDFBox试图很好并修复了损坏的输入文件,因为严格来说这些文件的连接都被破坏了,而PDFBox并不需要能够解析它们。)
您可以通过在每次迭代开始时清除saida
来解决此问题:
InputStream anterior = null;
ByteArrayOutputStream saida = new ByteArrayOutputStream();
for (File file : lista) {
saida.reset();
List<String> marcadores = marcadores(file.getName());
InputStream novo = new FileInputStream(file);
merge.concatena(anterior, novo, saida, marcadores);
anterior = new ByteArrayInputStream(saida.toByteArray());
}
使用原始方法,输入的结果大小接近26 MB,固定方法大约为5 MB,后者大小约为输入文件大小的总和。