AVRO模式演变添加带有默认值的可选列无法反序列化

时间:2019-07-29 15:05:41

标签: avro

在阅读avro文档(例如[1])时,我了解到支持架构演变,如果我添加具有指定默认值的列,则该列应向后兼容(甚至当我再次删除它时也可以向前兼容)。听起来不错,所以我添加了定义为的列:

        {
          "name": "newColumn",
          "type": ["null","string"],
          "default": null,
          "doc": "something wrong"
        }

并尝试从一开始就使用具有该架构的某些主题,它会失败并显示以下消息:

Caused by: java.lang.ArrayIndexOutOfBoundsException: 5
    at org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:424)
    at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:290)
    at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
    at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:267)
    at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
    at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:232)
    at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222)
    at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
    at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
    at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:232)
    at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222)
    at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145)
    at tech.allegro.schema.json2avro.converter.JsonAvroConverter.convertToJson(JsonAvroConverter.java:83)

提供更多信息。 Avro模式定义了一种顶级类型,具有2个字段。描述消息类型的字符串,以及N个类型的并集。可以读取所有N-1个未修改的类型,但是无法读取用可选的具有默认值的列更新的一种。我不确定这种设计严格来说是否正确,但这不是重点(可以批评和推荐更好的方法!)。我正在进行模式演变,似乎 无法正常工作。

我做错什么了吗?

[1] https://docs.oracle.com/database/nosql-12.1.3.4/GettingStartedGuide/schemaevolution.html#changeschema-rules

编辑: 如果我们将类型定义更改为:

"type": "string",
"default": ""

它仍然不起作用,并且生成的错误是:

Caused by: org.apache.avro.AvroRuntimeException: Malformed data. Length is negative: -1
    at org.apache.avro.io.BinaryDecoder.doReadBytes(BinaryDecoder.java:336)
    at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:263)
    at org.apache.avro.io.ResolvingDecoder.readString(ResolvingDecoder.java:201)
    at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:422)
    at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:414)
    at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:181)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
    at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:232)
    at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222)
    at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
    at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
    at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:232)
    at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222)
    at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145)
    at tech.allegro.schema.json2avro.converter.JsonAvroConverter.convertToJson(JsonAvroConverter.java:83)

哪些代码确实会导致给定的失败:

BinaryDecoder binaryDecoder = DecoderFactory.get().binaryDecoder(avro, (BinaryDecoder)null);
GenericRecord record = (GenericRecord)(new GenericDatumReader(schema)).read((Object)null, binaryDecoder);

1 个答案:

答案 0 :(得分:2)

关于模式演化及其工作方式通常存在一些误解。演化模式时,这并不意味着您不需要“ writer”模式来读取avro数据。为此,您应该使用以下构造函数GenericDatumReader

public GenericDatumReader(Schema writer,
                  Schema reader)

如您所见,必须存在写程序模式(用于序列化avro数据的模式)和读程序模式(您的“进化”模式)。有几种库/工具(Hive,Spark)对此进行了抽象,但这仅是可能的,因为文件本身包含架构(非架构少)

相关问题