使用pdfbox preflight 2.0.13验证pdf时出现java.lang.OutOfMemoryError

时间:2019-01-30 23:23:13

标签: java apache out-of-memory pdfbox

PDFBOX-4450 Details on Issue

不确定是否有人遇到此问题,但在验证pdf时遇到内存不足异常。在这里发布可见性,如果有人可以帮助的话,那太好了。

如果有人有任何想法,请分享。在这一点上,我真的无法前进。

  

我尝试过的东西

  • 以下是Wiki中的建议,没有成功PDFBox faq

  • 最大堆大小从2GB增加到4GB

  • 已删除jvm arg:-Dsun.java2d.cmm = sun.java2d.cmm.kcms.KcmsServiceProvider

  • 尝试使用jdk 1.7

  • 使用了临时文件(来自Wiki)
  • 禁用PDImageXObject的缓存(来自Wiki)
  

我的环境

  • Linux 64位(arch linux)
  • Java 8
  • PDFBox / Preflight版本。 2.0.13
  • jbig imageio ver。 3.0.2
  

Java信息

java -version

java版本“ 1.8.0_131”

Java(TM)SE运行时环境(内部版本1.8.0_131-b11)

Java HotSpot(TM)64位服务器VM(内部版本25.131-b11,混合模式)

  

使用了JVM Args

java -Xmx2048m -Dsun.java2d.cmm = sun.java2d.cmm.kcms.KcmsServiceProvider

  

示例pdf

Pdf from PDFBOX-4450

  

控制台输出

Jan 30, 2019 10:25:58 AM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
WARNING: Using fallback font ArialMT for base font Symbol
Jan 30, 2019 10:25:58 AM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
WARNING: Using fallback font ArialMT for base font ZapfDingbats
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOfRange(Arrays.java:3664)
at java.lang.String.<init>(String.java:207)
at java.lang.StringBuilder.toString(StringBuilder.java:407)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1587)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1587)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1587)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1587)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1587)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.toString(COSDictionary.java:1531)
at org.apache.pdfbox.preflight.xobject.XObjFormValidator.checkGroup(XObjFormValidator.java:138)
at org.apache.pdfbox.preflight.xobject.XObjFormValidator.validate(XObjFormValidator.java:73)
at org.apache.pdfbox.preflight.process.reflect.GraphicObjectPageValidationProcess.validate(GraphicObjectPageValidationProcess.java:74)
at org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:84)
at org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:57)
at org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validateXObjects(ResourcesValidationProcess.java:224)
at org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validate(ResourcesValidationProcess.java:81)
at org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:84)
  

示例代码

import java.io.File;
import java.util.ArrayList;
import java.util.List;
import org.apache.pdfbox.preflight.PreflightDocument;
import org.apache.pdfbox.preflight.ValidationResult;
import org.apache.pdfbox.preflight.ValidationResult.ValidationError;
import org.apache.pdfbox.preflight.parser.PreflightParser;

public class Validator {
  private File file = null;
  private List<ValidationError> errorList = new ArrayList<ValidationError>();

  public Validator(File file) {
    this.file = file;
  }

  public List<ValidationError> getErrors(){
    return errorList;
  }

  public boolean validate() throws Exception{
    PreflightParser parser = null;
    PreflightDocument document = null;
    ValidationResult result = null;
    try {
      parser = new PreflightParser(file);
      parser.parse();
      document = parser.getPreflightDocument();
      document.validate();
      result = document.getResult();
      errorList = result.getErrorsList();
    }
    catch(Exception e) {
      throw e;
    }
    finally {
      if(document != null) {
        try {
          document.close();
        }catch(Exception ignored) {}
      }
      parser = null;
      document = null;
      result = null;
    }
    return errorList.size() > 0 ? true : false;
  }
}

1 个答案:

答案 0 :(得分:2)

当我添加以下选项时:

-XX:+HeapDumpOnOutOfMemoryError -Xmx3550m -Xms3550m -Xmn2g 

再次失败。我使用VisualVM分析转储堆文件。我发现了一些有趣的东西。

heap dump file 而且char []的大部分内容是:

char[] content 我在

中找到了代码
//org.apache.pdfbox.preflight.process.reflect.SinglePageValidationProcess#validateGroupTransparency
    protected void validateGroupTransparency(PreflightContext context, PDPage page) throws ValidationException
    {
        COSBase baseGroup = page.getCOSObject().getItem(XOBJECT_DICTIONARY_KEY_GROUP);
        COSDictionary groupDictionary = COSUtils.getAsDictionary(baseGroup, context.getDocument().getDocument());
        if (groupDictionary != null)
        {
            String sVal = groupDictionary.getNameAsString(COSName.S);
            if (XOBJECT_DICTIONARY_VALUE_S_TRANSPARENCY.equals(sVal))
            {
                context.addValidationError(new ValidationError(ERROR_GRAPHIC_TRANSPARENCY_GROUP,
                        "Group has a transparency S entry or the S entry is null"));
            }
        }
    }

它创建一个ValidationError对象,但是构造函数是:

public ValidationError(String errorCode, String details, Throwable cause)
        {
            this(errorCode);
            if (details != null)
            {
                StringBuilder sb = new StringBuilder(this.details.length() + details.length() + 2);
                sb.append(this.details).append(", ").append(details);
                this.details = sb.toString();
            }
            this.cause = cause;
            t = new Exception();
        }

您会看到,一旦出现错误,它将创建ValidationError并创建一个StringBuilder。

因此,您可以通过三种方式解决该问题:

  1. 您可以扩展堆大小。 4G还不够,请尝试16G或更多。
  2. 不要使用PDFBox库。
  3. 更改PDFBox源代码。
    public ValidationError(String errorCode, String details, Throwable cause)
    {
        this(errorCode);
        if (details != null)
        {
            String key = errorCode + details;
            if (commonDetailMap.containsKey(key)) {
                this.details = commonDetailMap.get(key);
            } else {
                StringBuilder sb = new StringBuilder(this.details.length() + details.length() + 2);
                sb.append(this.details).append(", ").append(details);
                this.details = sb.toString();
                commonDetailMap.put(key, this.details);
            }

        }
        this.cause = cause;
        t = new Exception();
    }

我认为使用Map来避免创建StringBuilder也可以。但是,如果错误代码和详细信息为多值,则Map将太大。

因此,更改源代码的另一种方法是:

    public ValidationError(String errorCode, String details, Throwable cause)
    {
        this(errorCode);
        if (details != null)
        {
            StringBuilder sb = new StringBuilder(this.details.length() + details.length() + 2);
            sb.append(this.details).append(", ").append(details);
            // invoke intern
            this.details = sb.toString().intern();
        }
        this.cause = cause;
        t = new Exception();
    }

intern()是:

Returns a canonical representation for the string object.

我认为使用intern()更好。