使用GSON JsonReader处理大字段的最佳方法

时间:2016-09-21 11:45:19

标签: java gson out-of-memory

我得到了一个java.lang.OutOfMemoryError:Java堆空间,即使使用GSON Streaming也是如此。

{"result":"OK","base64":"JVBERi0xLjQKJeLjz9MKMSAwIG9iago8PC...."}

base64最长可达200​​Mb。 GSON占用的内存要多得多,(3GB)当我尝试将base64存储在一个变量中时,我得到一个:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:2367)
    at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
    at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
    at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:535)
    at java.lang.StringBuilder.append(StringBuilder.java:204)
    at com.google.gson.stream.JsonReader.nextQuotedValue(JsonReader.java:1014)
    at com.google.gson.stream.JsonReader.nextString(JsonReader.java:815)

处理这种领域的最佳方法是什么?

1 个答案:

答案 0 :(得分:4)

你获得OutOfMemoryError的原因是GSON nextString()返回一个字符串,该字符串是在使用StringBuilder构建一个非常大的字符串时聚合的。当您遇到这样的问题时,您必须处理中间数据,因为没有其他选择。不幸的是,GSON不会让你以任何方式处理大量的文字。

不确定是否可以更改响应有效负载,但如果不能,则可能需要实现自己的JSON阅读器,或“破解”现有的JsonReader以使其以流式方式工作。下面的示例基于GSON 2.5并大量使用反射,因为JsonReader非常小心地隐藏了它的状态。

EnhancedGson25JsonReader.java

final class EnhancedGson25JsonReader
        extends JsonReader {

    // A listener to accept the internal character buffers.
    // Accepting a single string built on such buffers is total memory waste as well.
    interface ISlicedStringListener {

        void accept(char[] buffer, int start, int length)
                throws IOException;

    }

    // The constants can be just copied

    /** @see JsonReader#PEEKED_NONE */
    private static final int PEEKED_NONE = 0;

    /** @see JsonReader#PEEKED_SINGLE_QUOTED */
    private static final int PEEKED_SINGLE_QUOTED = 8;

    /** @see JsonReader#PEEKED_DOUBLE_QUOTED */
    private static final int PEEKED_DOUBLE_QUOTED = 9;

    // Here is a bunch of spies made to "spy" for the parent's class state

    private final FieldSpy<Integer> peeked;
    private final MethodSpy<Integer> doPeek;
    private final MethodSpy<Integer> getLineNumber;
    private final MethodSpy<Integer> getColumnNumber;
    private final FieldSpy<char[]> buffer;
    private final FieldSpy<Integer> pos;
    private final FieldSpy<Integer> limit;
    private final MethodSpy<Character> readEscapeCharacter;
    private final FieldSpy<Integer> lineNumber;
    private final FieldSpy<Integer> lineStart;
    private final MethodSpy<Boolean> fillBuffer;
    private final MethodSpy<IOException> syntaxError;
    private final FieldSpy<Integer> stackSize;
    private final FieldSpy<int[]> pathIndices;

    private EnhancedJsonReader(final Reader reader)
            throws NoSuchFieldException, NoSuchMethodException {
        super(reader);
        peeked = spyField(JsonReader.class, this, "peeked");
        doPeek = spyMethod(JsonReader.class, this, "doPeek");
        getLineNumber = spyMethod(JsonReader.class, this, "getLineNumber");
        getColumnNumber = spyMethod(JsonReader.class, this, "getColumnNumber");
        buffer = spyField(JsonReader.class, this, "buffer");
        pos = spyField(JsonReader.class, this, "pos");
        limit = spyField(JsonReader.class, this, "limit");
        readEscapeCharacter = spyMethod(JsonReader.class, this, "readEscapeCharacter");
        lineNumber = spyField(JsonReader.class, this, "lineNumber");
        lineStart = spyField(JsonReader.class, this, "lineStart");
        fillBuffer = spyMethod(JsonReader.class, this, "fillBuffer", int.class);
        syntaxError = spyMethod(JsonReader.class, this, "syntaxError", String.class);
        stackSize = spyField(JsonReader.class, this, "stackSize");
        pathIndices = spyField(JsonReader.class, this, "pathIndices");
    }

    static EnhancedJsonReader getEnhancedGson25JsonReader(final Reader reader) {
        try {
            return new EnhancedJsonReader(reader);
        } catch ( final NoSuchFieldException | NoSuchMethodException ex ) {
            throw new RuntimeException(ex);
        }
    }

    // This method has been copied and reworked from the nextString() implementation

    void nextSlicedString(final ISlicedStringListener listener)
            throws IOException {
        int p = peeked.get();
        if ( p == PEEKED_NONE ) {
            p = doPeek.get();
        }
        switch ( p ) {
        case PEEKED_SINGLE_QUOTED:
            nextQuotedSlicedValue('\'', listener);
            break;
        case PEEKED_DOUBLE_QUOTED:
            nextQuotedSlicedValue('"', listener);
            break;
        default:
            throw new IllegalStateException("Expected a string but was " + peek()
                    + " at line " + getLineNumber.get()
                    + " column " + getColumnNumber.get()
                    + " path " + getPath()
            );
        }
        peeked.accept(PEEKED_NONE);
        pathIndices.get()[stackSize.get() - 1]++;
    }

    // The following method is also a copy-paste that was patched for the "spies".
    // It's, in principle, the same as the source one, but it has one more buffer singleCharBuffer
    // in order not to add another method to the ISlicedStringListener interface (enjoy lamdbas as much as possible).
    // Note that the main difference between these two methods is that this one
    // does not aggregate a single string value, but just delegates the internal
    // buffers to call-sites, so the latter ones might do anything with the buffers.

    /**
     * @see JsonReader#nextQuotedValue(char)
     */
    private void nextQuotedSlicedValue(final char quote, final ISlicedStringListener listener)
            throws IOException {
        final char[] buffer = this.buffer.get();
        final char[] singleCharBuffer = new char[1];
        while ( true ) {
            int p = pos.get();
            int l = limit.get();
            int start = p;
            while ( p < l ) {
                final int c = buffer[p++];
                if ( c == quote ) {
                    pos.accept(p);
                    listener.accept(buffer, start, p - start - 1);
                    return;
                } else if ( c == '\\' ) {
                    pos.accept(p);
                    listener.accept(buffer, start, p - start - 1);
                    singleCharBuffer[0] = readEscapeCharacter.get();
                    listener.accept(singleCharBuffer, 0, 1);
                    p = pos.get();
                    l = limit.get();
                    start = p;
                } else if ( c == '\n' ) {
                    lineNumber.accept(lineNumber.get() + 1);
                    lineStart.accept(p);
                }
            }
            listener.accept(buffer, start, p - start);
            pos.accept(p);
            if ( !fillBuffer.apply(just1) ) {
                throw syntaxError.apply(justUnterminatedString);
            }
        }
    }

    // Save some memory

    private static final Object[] just1 = { 1 };
    private static final Object[] justUnterminatedString = { "Unterminated string" };

}

FieldSpy.java

final class FieldSpy<T>
        implements Supplier<T>, Consumer<T> {

    private final Object instance;
    private final Field field;

    private FieldSpy(final Object instance, final Field field) {
        this.instance = instance;
        this.field = field;
    }

    static <T> FieldSpy<T> spyField(final Class<?> declaringClass, final Object instance, final String fieldName)
            throws NoSuchFieldException {
        final Field field = declaringClass.getDeclaredField(fieldName);
        field.setAccessible(true);
        return new FieldSpy<>(instance, field);
    }

    @Override
    public T get() {
        try {
            @SuppressWarnings("unchecked")
            final T value = (T) field.get(instance);
            return value;
        } catch ( final IllegalAccessException ex ) {
            throw new RuntimeException(ex);
        }
    }

    @Override
    public void accept(final T value) {
        try {
            field.set(instance, value);
        } catch ( final IllegalAccessException ex ) {
            throw new RuntimeException(ex);
        }
    }

}

MethodSpy.java

final class MethodSpy<T>
        implements Function<Object[], T>, Supplier<T> {

    private static final Object[] emptyObjectArray = {};

    private final Object instance;
    private final Method method;

    private MethodSpy(final Object instance, final Method method) {
        this.instance = instance;
        this.method = method;
    }

    static <T> MethodSpy<T> spyMethod(final Class<?> declaringClass, final Object instance, final String methodName, final Class<?>... parameterTypes)
            throws NoSuchMethodException {
        final Method method = declaringClass.getDeclaredMethod(methodName, parameterTypes);
        method.setAccessible(true);
        return new MethodSpy<>(instance, method);
    }

    @Override
    public T get() {
    // my javac generates useless new Object[0] if no args passed
        return apply(emptyObjectArray);
    }

    @Override
    public T apply(final Object[] arguments) {
        try {
            @SuppressWarnings("unchecked")
            final T value = (T) method.invoke(instance, arguments);
            return value;
        } catch ( final IllegalAccessException | InvocationTargetException ex ) {
            throw new RuntimeException(ex);
        }
    }

}

HugeJsonReaderDemo.java

这是一个使用该方法读取巨大的JSON并将其字符串值重定向到另一个文件的演示。

public static void main(final String... args)
        throws IOException {
    try ( final EnhancedGson25JsonReader input = getEnhancedGson25JsonReader(new InputStreamReader(new FileInputStream("./huge.json")));
          final Writer output = new OutputStreamWriter(new BufferedOutputStream(new FileOutputStream("./huge.json.STRINGS"))) ) {
        while ( input.hasNext() ) {
            final JsonToken token = input.peek();
            switch ( token ) {
            case BEGIN_OBJECT:
                input.beginObject();
                break;
            case NAME:
                input.nextName();
                break;
            case STRING:
                input.nextSlicedString(output::write);
                break;
            default:
                throw new AssertionError(token);
            }
        }
    }
}

我成功将上面的字段提取到文件中。输入文件长度为544 MB( 570 425 371 字节),并由以下JSON块生成:

  • {"result":"OK","base64":"
  • JVBERi0xLjQKJeLjz9MKMSAwIG9iago8PC× 16777216 (2 ^ 24)
  • "}

结果是(因为我只是将所有字符串重定向到文件):

  • OK
  • JVBERi0xLjQKJeLjz9MKMSAwIG9iago8PC× 16777216 (2 ^ 24)

我认为你面临一个非常有趣的问题。从GSON团队那里得到一些可能的API增强反馈会很好。