在Google Cloud Dataflow中发布嵌套TableRow时的无限递归

时间:2017-07-10 22:23:27

标签: google-bigquery google-cloud-dataflow

我试图传递我在管道阶段之间生成的TableRow,我收到以下错误:

Exception in thread "main" 
com.google.cloud.dataflow.sdk.Pipeline$PipelineExecutionException: 
java.lang.IllegalArgumentException: Forbidden IOException when writing to OutputStream 
[... exception propagation ...] 
Caused by: com.fasterxml.jackson.databind.JsonMappingException: 
Infinite recursion (StackOverflowError) (through reference chain: 
com.google.protobuf.Descriptors$Descriptor["file"]
->com.google.protobuf.Descriptors$FileDescriptor["messageTypes"]
->java.util.Collections$UnmodifiableRandomAccessList[0]-> 
[... many, many lines of this ...]
at com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:733)
    at com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:155)
    at com.fasterxml.jackson.databind.ser.impl.IndexedListSerializer.serializeContentsUsing(IndexedListSerializer.java:142)
    at com.fasterxml.jackson.databind.ser.impl.IndexedListSerializer.serializeContents(IndexedListSerializer.java:88)
    at com.fasterxml.jackson.databind.ser.impl.IndexedListSerializer.serialize(IndexedListSerializer.java:79)
    at com.fasterxml.jackson.databind.ser.impl.IndexedListSerializer.serialize(IndexedListSerializer.java:18)
    at com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:727)
    at com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:717)
    at com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:155)
    at com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:727)
    at com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:717)
    at com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:155)
    at com.fasterxml.jackson.databind.ser.impl.IndexedListSerializer.serializeContentsUsing(IndexedListSerializer.java:142)
[... many, many lines of this ...]
Caused by: java.lang.StackOverflowError
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
    at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:736)
    at com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:155)
    at com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:727)
    at com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:717)
    at com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:155)
    at com.fasterxml.jackson.databind.ser.impl.IndexedListSerializer.serializeContentsUsing(IndexedListSerializer.java:142)
    at com.fasterxml.jackson.databind.ser.impl.IndexedListSerializer.serializeContents(IndexedListSerializer.java:88)
    at com.fasterxml.jackson.databind.ser.impl.IndexedListSerializer.serialize(IndexedListSerializer.java:79)
    at com.fasterxml.jackson.databind.ser.impl.IndexedListSerializer.serialize(IndexedListSerializer.java:18)
[... snip ...]
at com.fasterxml.jackson.databind.ser.impl.IndexedListSerializer.serialize(IndexedListSerializer.java:79)
    at com.fasterxml.jackson.databind.ser.impl.IndexedListSerializer.serialize(IndexedListSerializer.java:18)
    at com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:727)
    at com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:717)
    at com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:155)

我通过其描述符从Google protobuf递归构建我的TableRow - 我以递归方式遍历描述符深度(因为protobufs可能有嵌套定义)并在遍历时构建TableRow。以下是TableRow创建类的摘录:

public void processElement(ProcessContext c) throws Exception {
    TableRow row = getTableRow(c.element());
    LOG.info(row.toPrettyString());
    c.output(row);
}

private TableRow getTableRow(TMessage message) throws Exception {
    TableRow row = new TableRow();
    encode(message, row);
    return row;
}

private TableCell getTableCell(TMessage message) throws Exception {
    TableCell cell = new TableCell();
    encode(message, cell);
    return cell;
}

private void encode(TMessage message, GenericJson row) throws Exception {
    Descriptors.Descriptor descriptor = message.getDescriptorForType();
    List<Descriptors.FieldDescriptor> fields = descriptor.getFields();

    for (Descriptors.FieldDescriptor fieldDescriptor : fields) {
        Descriptors.FieldDescriptor.Type fieldType = fieldDescriptor.getType();

        switch (fieldType) {
            case DOUBLE:
            case FLOAT:
            case INT64:
            case UINT64:
            case INT32:
            case FIXED64:
            case FIXED32:
            case UINT32:
            case SFIXED32:
            case SFIXED64:
            case SINT32:
            case SINT64:
            case BOOL:
            case STRING:
            case BYTES:
            case ENUM:
                if (fieldDescriptor.isRepeated()) {
                    List<Object> tableCells = new ArrayList<>();

                    tableCells.addAll((List<?>) message.getField(fieldDescriptor));

                    row.set(fieldDescriptor.getName(), tableCells);
                }
                else {
                    row.set(fieldDescriptor.getName(), message.getField(fieldDescriptor));
                }

                break;
            case MESSAGE:
                if (fieldDescriptor.isRepeated()) {
                    List<TableRow> tableRows = new ArrayList<>();
                    for (Object o : (List<?>) message.getField(fieldDescriptor)) {
                        TMessage nestedMessage = (TMessage) o;
                        TableRow tableRow = getTableRow(nestedMessage);
                        tableRows.add(tableRow);
                    }
                    row.set(fieldDescriptor.getName(), tableRows);
                }
                else {
                    row.set(fieldDescriptor.getName(), getTableCell((TMessage) message.getField(fieldDescriptor)));
                }
                break;
            case GROUP:
                throw new Exception("groups are deprecated");
        }

    }

我相信TableRow正在被正确创建,因为我已经用一些简单的虚拟数据测试了这个DoFn,并查看了我的数据集子集上的TableRow创建结果(参见上面的代码片段,我在哪里LOG.info TableRow编码的结果),结果TableRow似乎包含了我期望的所有数据,没有额外的字段。

1 个答案:

答案 0 :(得分:2)

基于堆栈跟踪和代码,看起来协议缓冲区消息中的某些内容可能是自引用的。遵循这些引用时,JSON编码失败。

查看代码,我的猜测是你遇到了一个枚举。如果您查看getField的协议缓冲区文档,它会返回EnumValueDescriptor

查看EnumValueDescriptor,它有一个指向FileDescriptor的链接,该链接包含一个指向EnumDescriptor的链接,该链接包含一个指向FileDescriptor的链接,该链接包含所有EnumDescriptors的列表,其中包含指向FileDescriptor的链接等。 / p>

如果您专门处理ENUM案例(特别是为了防止protos在JSON Map中显示为值),它应该可以解决您的问题。