使用以下定义的 Avro 模式和测试代码,在考虑 Avro 模式演变以及如何存储和稍后使用模式的第二版本检索 Avro 数据的第一个版本时,我有几个问题。在我的示例中,Person.avsc
表示第一个版本,PersonWithMiddleName.avsc
表示第二个版本,其中我们添加了 middleName
属性。
The Person is now serialized to a byte array: JoeCool
)并比较将 Person
序列化为字节数组与在测试期间将其写出到 person.avro
文件的时间。如您所见,架构似乎只与文件一起写出,而不是与字节数组一起写出。Person
对象序列化为 JSON,并尝试将其反序列化为 PersonWithMiddleName
。Java 测试代码
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.IOException;
import org.apache.avro.AvroTypeException;
import org.apache.avro.file.DataFileReader;
import org.apache.avro.file.DataFileWriter;
import org.apache.avro.io.BinaryDecoder;
import org.apache.avro.io.DatumReader;
import org.apache.avro.io.DecoderFactory;
import org.apache.avro.io.Encoder;
import org.apache.avro.io.EncoderFactory;
import org.apache.avro.io.JsonDecoder;
import org.apache.avro.specific.SpecificDatumReader;
import org.apache.avro.specific.SpecificDatumWriter;
import org.junit.jupiter.api.Assertions;
import org.junit.jupiter.api.Test;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class SchemaEvolutionTest {
Logger log = LoggerFactory.getLogger(this.getClass());
@Test
public void createAndReadPerson() {
// Create the Person using the Person schema
var person = new Person();
person.setFirstName("Joe");
person.setLastName("Cool");
log.info("Person has been created: {}", person);
SpecificDatumWriter<Person> personSpecificDatumWriter =
new SpecificDatumWriter<Person>(Person.class);
DataFileWriter<Person> dataFileWriter = new DataFileWriter<Person>(personSpecificDatumWriter);
try {
dataFileWriter.create(person.getSchema(), new File("person.avro"));
dataFileWriter.append(person);
dataFileWriter.close();
} catch (IOException e) {
Assertions.fail();
}
log.info("Person has been written to an Avro file");
// ******************************************************************************************************
// Next, read as Person from the Avro file using the Person schema
DatumReader<Person> personDatumReader =
new SpecificDatumReader<Person>(Person.getClassSchema());
var personAvroFile = new File("person.avro");
DataFileReader<Person> personDataFileReader = null;
try {
personDataFileReader = new DataFileReader<Person>(personAvroFile, personDatumReader);
} catch (IOException e1) {
Assertions.fail();
}
Person personReadFromFile = null;
while (personDataFileReader.hasNext()) {
// Reuse object by passing it to next(). This saves us from
// allocating and garbage collecting many objects for files with
// many items.
try {
personReadFromFile = personDataFileReader.next(person);
} catch (IOException e) {
Assertions.fail();
}
}
log.info("Person read from the file: {}", personReadFromFile.toString());
// ******************************************************************************************************
// Read the Person from the Person file as PersonWithMiddleName using only the
// PersonWithMiddleName schema
DatumReader<PersonWithMiddleName> personWithMiddleNameDatumReader =
new SpecificDatumReader<PersonWithMiddleName>(PersonWithMiddleName.getClassSchema());
DataFileReader<PersonWithMiddleName> personWithMiddleNameDataFileReader = null;
try {
personWithMiddleNameDataFileReader =
new DataFileReader<PersonWithMiddleName>(personAvroFile, personWithMiddleNameDatumReader);
} catch (IOException e1) {
Assertions.fail();
}
PersonWithMiddleName personWithMiddleName = null;
while (personWithMiddleNameDataFileReader.hasNext()) {
// Reuse object by passing it to next(). This saves us from
// allocating and garbage collecting many objects for files with
// many items.
try {
personWithMiddleName = personWithMiddleNameDataFileReader.next(personWithMiddleName);
} catch (IOException e) {
Assertions.fail();
}
}
log.info(
"Now a PersonWithMiddleName has been read from the file that was written as a Person: {}",
personWithMiddleName.toString());
// ******************************************************************************************************
// Serialize the Person to a byte array
byte[] personByteArray = new byte[0];
ByteArrayOutputStream personByteArrayOutputStream = new ByteArrayOutputStream();
Encoder encoder = null;
try {
encoder = EncoderFactory.get().binaryEncoder(personByteArrayOutputStream, null);
personSpecificDatumWriter.write(person, encoder);
encoder.flush();
personByteArray = personByteArrayOutputStream.toByteArray();
} catch (IOException e) {
log.error("Serialization error:" + e.getMessage());
}
log.info("The Person is now serialized to a byte array: {}", new String(personByteArray));
// ******************************************************************************************************
// Deserialize the Person byte array into a Person object
BinaryDecoder binaryDecoder = null;
Person decodedPerson = null;
try {
binaryDecoder = DecoderFactory.get().binaryDecoder(personByteArray, null);
decodedPerson = personDatumReader.read(null, binaryDecoder);
} catch (IOException e) {
log.error("Deserialization error:" + e.getMessage());
}
log.info("Decoded Person from byte array {}", decodedPerson.toString());
// ******************************************************************************************************
// Deserialize the Person byte array into a PesonWithMiddleName object
PersonWithMiddleName decodedPersonWithMiddleName = null;
try {
binaryDecoder = DecoderFactory.get().binaryDecoder(personByteArray, null);
decodedPersonWithMiddleName = personWithMiddleNameDatumReader.read(null, binaryDecoder);
} catch (IOException e) {
log.error("Deserialization error:" + e.getMessage());
}
log.info(
"Decoded PersonWithMiddleName from byte array {}", decodedPersonWithMiddleName.toString());
// ******************************************************************************************************
// Serialize the Person to JSON
byte[] jsonByteArray = new byte[0];
personByteArrayOutputStream = new ByteArrayOutputStream();
Encoder jsonEncoder = null;
try {
jsonEncoder =
EncoderFactory.get().jsonEncoder(Person.getClassSchema(), personByteArrayOutputStream);
personSpecificDatumWriter.write(person, jsonEncoder);
jsonEncoder.flush();
jsonByteArray = personByteArrayOutputStream.toByteArray();
} catch (IOException e) {
log.error("Serialization error:" + e.getMessage());
}
log.info("The Person is now serialized to JSON: {}", new String(jsonByteArray));
// ******************************************************************************************************
// Deserialize the Person JSON into a Person object
JsonDecoder jsonDecoder = null;
try {
jsonDecoder =
DecoderFactory.get().jsonDecoder(Person.getClassSchema(), new String(jsonByteArray));
decodedPerson = personDatumReader.read(null, jsonDecoder);
} catch (IOException e) {
log.error("Deserialization error:" + e.getMessage());
}
log.info("Decoded Person from JSON: {}", decodedPerson.toString());
// ******************************************************************************************************
// Deserialize the Person JSON into a PersonWithMiddleName object
try {
jsonDecoder =
DecoderFactory.get()
.jsonDecoder(PersonWithMiddleName.getClassSchema(), new String(jsonByteArray));
decodedPersonWithMiddleName = personWithMiddleNameDatumReader.read(null, jsonDecoder);
} catch (AvroTypeException ae) {
// Do nothing. We expect this since JSON didn't serialize anything out.
log.error(
"An AvroTypeException occurred trying to deserialize Person JSON back into a PersonWithMiddleName. Here's the exception: {}",ae.getMessage());
} catch (Exception e) {
log.error("Deserialization error:" + e.getMessage());
}
}
}
Person.avsc
{
"type": "record",
"namespace": "org.acme.avro_testing",
"name": "Person",
"fields": [
{
"name": "firstName",
"type": ["null", "string"],
"default": null
},
{
"name": "lastName",
"type": ["null", "string"],
"default": null
}
]
}
PersonWithMiddleName.avsc
{
"type": "record",
"namespace": "org.acme.avro_testing",
"name": "PersonWithMiddleName",
"fields": [
{
"name": "firstName",
"type": ["null", "string"],
"default": null
},
{
"name": "middleName",
"type": ["null", "string"],
"default": null
},
{
"name": "lastName",
"type": ["null", "string"],
"default": null
}
]
}
测试输出
Person has been created: {"firstName": "Joe", "lastName": "Cool"}
Person has been written to an Avro file
Person read from the file: {"firstName": "Joe", "lastName": "Cool"}
Now a PersonWithMiddleName has been read from the file that was written as a Person: {"firstName": "Joe", "middleName": null, "lastName": "Cool"}
The Person is now serialized to a byte array: JoeCool
Decoded Person from byte array {"firstName": "Joe", "lastName": "Cool"}
Decoded PersonWithMiddleName from byte array {"firstName": "Joe", "middleName": null, "lastName": "Cool"}
The Person is now serialized to JSON: {"firstName":{"string":"Joe"},"lastName":{"string":"Cool"}}
Decoded Person from JSON: {"firstName": "Joe", "lastName": "Cool"}
An AvroTypeException occurred trying to deserialize Person JSON back into a PersonWithMiddleName. Here's the exception: Expected field name not found: middleName
person.avro
Objavro.schema�{"type":"record","name":"Person","namespace":"org.acme.avro_testing","fields":[{"name":"firstName","type":["null","string"],"default":null},{"name":"lastName","type":["null","string"],"default":null}]}
答案 0 :(得分:0)
对于第一个问题,我不是 Java 专家,但是在 Python 中,不是写入实际文件,而是具有与文件具有相同接口的类文件对象的概念,但只是写入字节缓冲区。例如,改为这样做:
file = open(file_name, "wb")
# use avro library to write to file
file.close()
你可以这样做:
from io import BytesIO
bytes_interface = BytesIO()
# use bytes_interface the same way you would the previous "file" object
byte_output = bytes_interface.getvalue()
所以最后的 byte_output
将是通常会写入文件的字节,但现在只是一个可以存储在任何地方的字节缓冲区。 Java 有这样的概念吗?或者,如果您绝对必须完成写入实际临时文件的过程,我假设 Java 中有某种方法可以将文件内容读回字节缓冲区。
对于第二个问题,我认为您遇到了此 Jira 票证中提到的相同问题:https://issues.apache.org/jira/browse/AVRO-2890。目前,JSON 解码器期望写入数据时使用的模式,并且无法使用与写入数据时使用的模式不同的模式进行任何形式的模式演变。