我尝试使用avro-python3(向后兼容)重新创建模式演变案例。
我有两种模式:
import avro.schema
from avro.datafile import DataFileReader, DataFileWriter
from avro.io import DatumReader, DatumWriter
schema_v1 = avro.schema.Parse("""
{
"type": "record",
"namespace": "com.example",
"name": "CustomerV1",
"fields": [
{ "name": "first_name", "type": "string", "doc": "First Name of Customer" },
{ "name": "last_name", "type": "string", "doc": "Last Name of Customer" },
{ "name": "age", "type": "int", "doc": "Age at the time of registration" },
{ "name": "height", "type": "float", "doc": "Height at the time of registration in cm" },
{ "name": "weight", "type": "float", "doc": "Weight at the time of registration in kg" },
{ "name": "automated_email", "type": "boolean", "default": true, "doc": "Field indicating if the user is enrolled in marketing emails" }
]
}
""")
schema_v2 = avro.schema.Parse("""
{
"type": "record",
"namespace": "com.example",
"name": "CustomerV2",
"fields": [
{ "name": "first_name", "type": "string", "doc": "First Name of Customer" },
{ "name": "last_name", "type": "string", "doc": "Last Name of Customer" },
{ "name": "age", "type": "int", "doc": "Age at the time of registration" },
{ "name": "height", "type": "float", "doc": "Height at the time of registration in cm" },
{ "name": "weight", "type": "float", "doc": "Weight at the time of registration in kg" },
{ "name": "phone_number", "type": ["null", "string"], "default": null, "doc": "optional phone number"},
{ "name": "email", "type": "string", "default": "missing@example.com", "doc": "email address"}
]
}
""")
第二个架构没有automated_email
字段,但有两个附加字段:phone_number
和email
。
如果我使用schema_v1编写avro记录,则根据avro模式演变规则:
writer = DataFileWriter(open("customer_v1.avro", "wb"), DatumWriter(), schema_v1)
writer.append({
"first_name": "John",
"last_name": "Doe",
"age" : 34,
"height": 178.0,
"weight": 75.0,
"automated_email": True
})
writer.close()
...我可以使用schema_v2读取它,只要存在不存在的字段的默认值
reader = DataFileReader(open("customer_v1.avro", "rb"), DatumReader(reader_schema=schema_v2))
for field in reader:
print(field)
reader.close()
但是我收到以下错误
SchemaResolutionException: Schemas do not match.
我知道这在Java中有效。这是一个视频课程的例子。 有没有办法让它在python中工作?
答案 0 :(得分:0)
fastavro
(一种替代的python实现)可以很好地处理此问题。
使用第一个架构编写的代码在这里:
s1 = {
"type": "record",
"namespace": "com.example",
"name": "CustomerV1",
"fields": [
{"name": "first_name", "type": "string", "doc": "First Name of Customer"},
{"name": "last_name", "type": "string", "doc": "Last Name of Customer"},
{"name": "age", "type": "int", "doc": "Age at the time of registration"},
{
"name": "height",
"type": "float",
"doc": "Height at the time of registration in cm",
},
{
"name": "weight",
"type": "float",
"doc": "Weight at the time of registration in kg",
},
{
"name": "automated_email",
"type": "boolean",
"default": True,
"doc": "Field indicating if the user is enrolled in marketing emails",
},
],
}
record = {
"first_name": "John",
"last_name": "Doe",
"age": 34,
"height": 178.0,
"weight": 75.0,
"automated_email": True,
}
import fastavro
with open("test.avro", "wb") as fp:
fastavro.writer(fp, fastavro.parse_schema(s1), [record])
要阅读第二个模式:
s2 = {
"type": "record",
"namespace": "com.example",
"name": "CustomerV2",
"fields": [
{"name": "first_name", "type": "string", "doc": "First Name of Customer"},
{"name": "last_name", "type": "string", "doc": "Last Name of Customer"},
{"name": "age", "type": "int", "doc": "Age at the time of registration"},
{
"name": "height",
"type": "float",
"doc": "Height at the time of registration in cm",
},
{
"name": "weight",
"type": "float",
"doc": "Weight at the time of registration in kg",
},
{
"name": "phone_number",
"type": ["null", "string"],
"default": None,
"doc": "optional phone number",
},
{
"name": "email",
"type": "string",
"default": "missing@example.com",
"doc": "email address",
},
],
}
import fastavro
with open("test.avro", "rb") as fp:
for record in fastavro.reader(fp, fastavro.parse_schema(s2)):
print(record)
输出如预期的那样是新字段:
{'first_name': 'John', 'last_name': 'Doe', 'age': 34, 'height': 178.0, 'weight': 75.0, 'phone_number': None, 'email': 'missing@example.com'}
答案 1 :(得分:0)
如果将第二个架构从CustomerV2更改为CustomerV1,则它可与avro-python3版本1.10.0一起使用。