Avro 模式演化测试和问题

时间:2021-04-08 22:13:30

标签: avro

使用以下定义的 Avro 模式和测试代码,在考虑 Avro 模式演变以及如何存储和稍后使用模式的第二版本检索 Avro 数据的第一个版本时,我有几个问题。在我的示例中,Person.avsc 表示第一个版本,PersonWithMiddleName.avsc 表示第二个版本,其中我们添加了 middleName 属性。

  1. 有没有办法将 Avro 架构 将二进制编码数据存储为 Java 中的字节数组?我们希望将 Avro 对象存储到 DynamoDB,并且我们希望将 Avro 数据存储为 blob 与存储在它旁边的架构(就像它存储到文件中)?作为参考,请查看下面我的测试输出(二进制内容未复制,因此该行仅读取 The Person is now serialized to a byte array: JoeCool)并比较将 Person 序列化为字节数组与在测试期间将其写出到 person.avro 文件的时间。如您所见,架构似乎只与文件一起写出,而不是与字节数组一起写出。
  2. 我在测试过程中遇到的 AvroTypeException 是否真的如我在测试的 catch 块中所指出的那样?在本例中,我已将 Person 对象序列化为 JSON,并尝试将其反序列化为 PersonWithMiddleName

Java 测试代码

import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.IOException;
import org.apache.avro.AvroTypeException;
import org.apache.avro.file.DataFileReader;
import org.apache.avro.file.DataFileWriter;
import org.apache.avro.io.BinaryDecoder;
import org.apache.avro.io.DatumReader;
import org.apache.avro.io.DecoderFactory;
import org.apache.avro.io.Encoder;
import org.apache.avro.io.EncoderFactory;
import org.apache.avro.io.JsonDecoder;
import org.apache.avro.specific.SpecificDatumReader;
import org.apache.avro.specific.SpecificDatumWriter;
import org.junit.jupiter.api.Assertions;
import org.junit.jupiter.api.Test;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class SchemaEvolutionTest {
  Logger log = LoggerFactory.getLogger(this.getClass());

  @Test
  public void createAndReadPerson() {
    // Create the Person using the Person schema
    var person = new Person();
    person.setFirstName("Joe");
    person.setLastName("Cool");
    log.info("Person has been created: {}", person);
    SpecificDatumWriter<Person> personSpecificDatumWriter =
        new SpecificDatumWriter<Person>(Person.class);
    DataFileWriter<Person> dataFileWriter = new DataFileWriter<Person>(personSpecificDatumWriter);
    try {
      dataFileWriter.create(person.getSchema(), new File("person.avro"));
      dataFileWriter.append(person);
      dataFileWriter.close();
    } catch (IOException e) {
      Assertions.fail();
    }
    log.info("Person has been written to an Avro file");

    // ******************************************************************************************************

    // Next, read as Person from the Avro file using the Person schema
    DatumReader<Person> personDatumReader =
        new SpecificDatumReader<Person>(Person.getClassSchema());
    var personAvroFile = new File("person.avro");

    DataFileReader<Person> personDataFileReader = null;
    try {
      personDataFileReader = new DataFileReader<Person>(personAvroFile, personDatumReader);
    } catch (IOException e1) {
      Assertions.fail();
    }
    Person personReadFromFile = null;
    while (personDataFileReader.hasNext()) {
      // Reuse object by passing it to next(). This saves us from
      // allocating and garbage collecting many objects for files with
      // many items.
      try {
        personReadFromFile = personDataFileReader.next(person);
      } catch (IOException e) {
        Assertions.fail();
      }
    }
    log.info("Person read from the file: {}", personReadFromFile.toString());

    // ******************************************************************************************************

    // Read the Person from the Person file as PersonWithMiddleName using only the
    // PersonWithMiddleName schema
    DatumReader<PersonWithMiddleName> personWithMiddleNameDatumReader =
        new SpecificDatumReader<PersonWithMiddleName>(PersonWithMiddleName.getClassSchema());
    DataFileReader<PersonWithMiddleName> personWithMiddleNameDataFileReader = null;
    try {
      personWithMiddleNameDataFileReader =
          new DataFileReader<PersonWithMiddleName>(personAvroFile, personWithMiddleNameDatumReader);
    } catch (IOException e1) {
      Assertions.fail();
    }
    PersonWithMiddleName personWithMiddleName = null;
    while (personWithMiddleNameDataFileReader.hasNext()) {
      // Reuse object by passing it to next(). This saves us from
      // allocating and garbage collecting many objects for files with
      // many items.
      try {
        personWithMiddleName = personWithMiddleNameDataFileReader.next(personWithMiddleName);
      } catch (IOException e) {
        Assertions.fail();
      }
    }
    log.info(
        "Now a PersonWithMiddleName has been read from the file that was written as a Person: {}",
        personWithMiddleName.toString());

    // ******************************************************************************************************

    // Serialize the Person to a byte array
    byte[] personByteArray = new byte[0];
    ByteArrayOutputStream personByteArrayOutputStream = new ByteArrayOutputStream();
    Encoder encoder = null;
    try {
      encoder = EncoderFactory.get().binaryEncoder(personByteArrayOutputStream, null);
      personSpecificDatumWriter.write(person, encoder);
      encoder.flush();
      personByteArray = personByteArrayOutputStream.toByteArray();
    } catch (IOException e) {
      log.error("Serialization error:" + e.getMessage());
    }
    log.info("The Person is now serialized to a byte array: {}", new String(personByteArray));

    // ******************************************************************************************************

    // Deserialize the Person byte array into a Person object
    BinaryDecoder binaryDecoder = null;
    Person decodedPerson = null;
    try {
      binaryDecoder = DecoderFactory.get().binaryDecoder(personByteArray, null);
      decodedPerson = personDatumReader.read(null, binaryDecoder);
    } catch (IOException e) {
      log.error("Deserialization error:" + e.getMessage());
    }
    log.info("Decoded Person from byte array {}", decodedPerson.toString());

    // ******************************************************************************************************

    // Deserialize the Person byte array into a PesonWithMiddleName object
    PersonWithMiddleName decodedPersonWithMiddleName = null;
    try {
      binaryDecoder = DecoderFactory.get().binaryDecoder(personByteArray, null);
      decodedPersonWithMiddleName = personWithMiddleNameDatumReader.read(null, binaryDecoder);
    } catch (IOException e) {
      log.error("Deserialization error:" + e.getMessage());
    }
    log.info(
        "Decoded PersonWithMiddleName from byte array {}", decodedPersonWithMiddleName.toString());

    // ******************************************************************************************************

    // Serialize the Person to JSON
    byte[] jsonByteArray = new byte[0];
    personByteArrayOutputStream = new ByteArrayOutputStream();
    Encoder jsonEncoder = null;
    try {
      jsonEncoder =
          EncoderFactory.get().jsonEncoder(Person.getClassSchema(), personByteArrayOutputStream);
      personSpecificDatumWriter.write(person, jsonEncoder);
      jsonEncoder.flush();
      jsonByteArray = personByteArrayOutputStream.toByteArray();
    } catch (IOException e) {
      log.error("Serialization error:" + e.getMessage());
    }
    log.info("The Person is now serialized to JSON: {}", new String(jsonByteArray));

    // ******************************************************************************************************

    // Deserialize the Person JSON into a Person object
    JsonDecoder jsonDecoder = null;
    try {
      jsonDecoder =
          DecoderFactory.get().jsonDecoder(Person.getClassSchema(), new String(jsonByteArray));
      decodedPerson = personDatumReader.read(null, jsonDecoder);
    } catch (IOException e) {
      log.error("Deserialization error:" + e.getMessage());
    }
    log.info("Decoded Person from JSON: {}", decodedPerson.toString());

    // ******************************************************************************************************

    // Deserialize the Person JSON into a PersonWithMiddleName object
    try {
      jsonDecoder =
          DecoderFactory.get()
              .jsonDecoder(PersonWithMiddleName.getClassSchema(), new String(jsonByteArray));
      decodedPersonWithMiddleName = personWithMiddleNameDatumReader.read(null, jsonDecoder);
    } catch (AvroTypeException ae) {
      // Do nothing. We expect this since JSON didn't serialize anything out.
      log.error(
          "An AvroTypeException occurred trying to deserialize Person JSON back into a PersonWithMiddleName. Here's the exception: {}",ae.getMessage());
    } catch (Exception e) {
      log.error("Deserialization error:" + e.getMessage());
    }

  }
}

Person.avsc

{
    "type": "record",
    "namespace": "org.acme.avro_testing",
    "name": "Person",
    "fields": [
        {
            "name": "firstName",
            "type": ["null", "string"],
            "default": null
        },
        {
            "name": "lastName",
            "type": ["null", "string"],
            "default": null
        }
    ]
}

PersonWithMiddleName.avsc

{
    "type": "record",
    "namespace": "org.acme.avro_testing",
    "name": "PersonWithMiddleName",
    "fields": [
        {
            "name": "firstName",
            "type": ["null", "string"],
            "default": null
        },
        {
            "name": "middleName",
            "type": ["null", "string"],
            "default": null
        },
        {
            "name": "lastName",
            "type": ["null", "string"],
            "default": null
        }
    ]
}

测试输出

Person has been created: {"firstName": "Joe", "lastName": "Cool"}
Person has been written to an Avro file
Person read from the file: {"firstName": "Joe", "lastName": "Cool"}
Now a PersonWithMiddleName has been read from the file that was written as a Person: {"firstName": "Joe", "middleName": null, "lastName": "Cool"}
The Person is now serialized to a byte array: JoeCool
Decoded Person from byte array {"firstName": "Joe", "lastName": "Cool"}
Decoded PersonWithMiddleName from byte array {"firstName": "Joe", "middleName": null, "lastName": "Cool"}
The Person is now serialized to JSON: {"firstName":{"string":"Joe"},"lastName":{"string":"Cool"}}
Decoded Person from JSON: {"firstName": "Joe", "lastName": "Cool"}
An AvroTypeException occurred trying to deserialize Person JSON back into a PersonWithMiddleName. Here's the exception: Expected field name not found: middleName

person.avro

Objavro.schema�{"type":"record","name":"Person","namespace":"org.acme.avro_testing","fields":[{"name":"firstName","type":["null","string"],"default":null},{"name":"lastName","type":["null","string"],"default":null}]}

1 个答案:

答案 0 :(得分:0)

对于第一个问题,我不是 Java 专家,但是在 Python 中,不是写入实际文件,而是具有与文件具有相同接口的类文件对象的概念,但只是写入字节缓冲区。例如,改为这样做:

file = open(file_name, "wb")
# use avro library to write to file
file.close()

你可以这样做:

from io import BytesIO
bytes_interface = BytesIO()
# use bytes_interface the same way you would the previous "file" object
byte_output = bytes_interface.getvalue()

所以最后的 byte_output 将是通常会写入文件的字节,但现在只是一个可以存储在任何地方的字节缓冲区。 Java 有这样的概念吗?或者,如果您绝对必须完成写入实际临时文件的过程,我假设 Java 中有某种方法可以将文件内容读回字节缓冲区。

对于第二个问题,我认为您遇到了此 Jira 票证中提到的相同问题:https://issues.apache.org/jira/browse/AVRO-2890。目前,JSON 解码器期望写入数据时使用的模式,并且无法使用与写入数据时使用的模式不同的模式进行任何形式的模式演变。

相关问题