PDFBox PDFMergerUtility:如何判断哪些来源失败?

时间:2016-02-12 23:12:01

标签: java exception pdfbox unparseable

所以,我这样做:

PDFMergerUtility mergePdf = new PDFMergerUtility();

for (int i = 0; i < filePaths.size(); i++) 
    mergePdf.addSource(filePaths.get(i));

mergePdf.setDestinationFileName(tempFile.getAbsolutePath()); 
mergePdf.mergeDocuments();

在PDF上无法解析异常(无论是损坏的PDF还是PDFBox无法处理的内容)之前,哪种方法很有效。它并不经常发生。

我希望能够告诉它失败了哪些来源,在后续合并中排除它们并告诉用户哪些文档失败。

可以这样做吗?

更新:

这是我的例外:

java.io.IOException: Error: Expected a long type at offset 591535, instead got 'E^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^UZí^KÄ@©¢^X<8d>G §ÑE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^TQE^T<84>f<96><8a>'
    at org.apache.pdfbox.pdfparser.BaseParser.readLong(BaseParser.java:1695)
    at org.apache.pdfbox.pdfparser.BaseParser.readObjectNumber(BaseParser.java:1623)
    at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:614)
    at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:203)
    at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1220)
    at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1187)
    at org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments(PDFMergerUtility.java:237)
    at org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments(PDFMergerUtility.java:194)
    at myapp.util.DocumentImage.combinePDFs(DocumentImage.java:289)
    at myapp.webapp.download.DownloadLatestForCLO.generate(DownloadLatestForCLO.java:73)
    at myapp.webapp.download.DownloadLatestForCLO.getFileSize(DownloadLatestForCLO.java:64)
    at myapp.webapp.download.DownloadServlet.handleRequest(DownloadServlet.java:58)
    at myapp.webapp.download.DownloadServlet.doGet(DownloadServlet.java:32)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:621)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
    at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:225)
    at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
    at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
    at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
    at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
    at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)
    at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
    at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
    at org.apache.coyote.ajp.AjpProcessor.process(AjpProcessor.java:200)
    at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:579)
    at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

1 个答案:

答案 0 :(得分:0)

幸运的是, PDFBox Opensource ,因此下载了最新的源代码(撰写本文时为2.00 RC3)并在文件中 \pdfbox-2.0.0-RC3\pdfbox\src\main\java\org\apache\pdfbox\multipdf\PDFMergerUtility.java(第188行)

我们可以看到它从较低级别抛出此异常并且没有捕获它并添加导致错误的文件的详细信息。

在修复之前,你必须在你的代码中捕获这个错误,并迭代加载和关闭它们的每个源文件,直到你找到赢得的那个文件。 ; t能够自行处理并报告。

如果您有兴趣在源头修复问题(在 PDFBox 内),那么这就是编辑并提交给PDFBox项目团队。当该修复程序合并到构建中并升级到该版本时,您可以安全地删除迭代代码:

        try
        {
            MemoryUsageSetting partitionedMemSetting = memUsageSetting != null ? 
                    memUsageSetting.getPartitionedCopy(sources.size()+1) :
                    MemoryUsageSetting.setupMainMemoryOnly();
            Iterator<InputStream> sit = sources.iterator();
            destination = new PDDocument(partitionedMemSetting);

            while (sit.hasNext())
            {
                sourceFile = sit.next();
                source = PDDocument.load(sourceFile, partitionedMemSetting);
                tobeclosed.add(source);
                appendDocument(destination, source);
            }
            if (destinationStream == null)
            {
                destination.save(destinationFileName);
            }
            else
            {
                destination.save(destinationStream);
            }
        }

catch(IOException e){/ *插入代码将其置于内部异常中并抛出一个包含命名的&#39; sourceFile&#39; * /}

        finally
        {
            ....}