Sejda无法拆分大型PDF

时间:2019-02-26 23:03:58

标签: pdf sejda pdf-split

我正在尝试分割2,000页(大小为8,482,816字节)的PDF,但出现以下错误。 sejda可以处理多少页是否有限制?显然,我在使用pdfsam时和在后端使用sejda时遇到了相同的错误。

C:\Users\tomas.greif>C:\Users\xxx\Desktop\split\sejda-console-3.2.67\bin\sejda-console splitbyevery -f "C:\Users\tomas.greif\Desktop\split\01-04-18 2.pdf" -o C:\Users\tomas.greif\Desktop\split\out -n 1 -p [CURRENTPAGE#####]
Configuring Sejda 3.2.67
Document root element "sejda", must match DOCTYPE root "null".
Document is invalid: no grammar found.
Starting execution with arguments: 'splitbyevery -f C:\Users\xxx\Desktop\split\01-04-18 2.pdf -o C:\Users\xxx\Desktop\split\out -n 1 -p [CURRENTPAGE#####]'
Java version: '1.8.0_191'
Validating parameters.
Starting task (org.sejda.impl.sambox.SplitByPageNumbersTask@62379589) execution.
Opening C:\Users\xxx\Desktop\split\01-04-18 2.pdf
Found 0 inherited images and 0 inherited fonts potentially unused
Starting split by page numbers for org.sejda.model.parameter.SplitByEveryXPagesParameters@50a638b5[step=1,optimizationPolicy=AUTO,discardOutline=false,outputPrefix=[CURRENTPAGE#####],output=org.sejda.model.output.FileOrDirectoryTaskOutput@1189dd52[C:\Users\tomas.greif\Desktop\split\out],sourceList=[C:\Users\tomas.greif\Desktop\split\01-04-18 2.pdf],compress=true,version=VERSION_1_6,existingOutputPolicy=FAIL,lenient=false,1]
Starting split at page 1 of the original document
Created output temporary buffer C:\Users\xxx\Desktop\split\out\.sejdaTmp8635042219708410613.tmp
Task progress: 0% done
Filtering annotations
Skipped acroform merge, nothing to merge
Ending split at page 1 of the original document, generated document size is 11.68 KB
Starting split at page 2 of the original document
Created output temporary buffer C:\Users\xxx\Desktop\split\out\.sejdaTmp4677890173653656260.tmp
Exception in thread "main" java.lang.StackOverflowError
        at java.util.Spliterator.getExactSizeIfKnown(Unknown Source)
        at java.util.stream.AbstractPipeline.copyInto(Unknown Source)
        at java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown Source)
        at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(Unknown Source)
        at java.util.stream.AbstractPipeline.evaluate(Unknown Source)
        at java.util.stream.ReferencePipeline.collect(Unknown Source)
        at org.sejda.sambox.pdmodel.PDPageTree.getKids(PDPageTree.java:172)
        at org.sejda.sambox.pdmodel.PDPageTree.get(PDPageTree.java:318)
        at org.sejda.sambox.pdmodel.PDPageTree.get(PDPageTree.java:327)
        at org.sejda.sambox.pdmodel.PDPageTree.get(PDPageTree.java:327)
        at org.sejda.sambox.pdmodel.PDPageTree.get(PDPageTree.java:327)
        at org.sejda.sambox.pdmodel.PDPageTree.get(PDPageTree.java:327)

我有来自同一来源的其他PDF,更少的页面拆分就可以了。我尝试在500页或1000页后进行拆分,但出现相同的错误。我猜这与页面数有关,因为文件也不大。

更新:我能够使用PDFTKBuilder,它显然使用了不同的底层库,并且PDF拆分效果很好。这可能是sejda的错误或局限性。

0 个答案:

没有答案