我想实现一个可配置的Kafka流,它读取一行数据并应用转换列表。就像将函数应用于记录的字段,重命名字段等一样。流应该是完全可配置的,因此我可以指定哪些变换应该应用于哪个字段。我使用Avro将数据编码为GenericRecords。我的问题是我还需要创建新列的转换。它们不应覆盖字段的先前值,而应将新字段附加到记录中。这意味着记录的架构会发生变化。到目前为止我提出的解决方案是首先迭代转换列表,以确定我需要添加到架构的哪些字段。然后我创建一个新的模式,其中包含旧字段和新字段
转换列表(总是会有一个源字段传递给transform方法,然后将结果写回targetField):
val transforms: List[Transform] = List(
FieldTransform(field = "referrer", targetField = "referrer", method = "mask"),
FieldTransform(field = "name", targetField = "name_clean", method = "replaceUmlauts")
)
case class FieldTransform(field: String, targetField: String, method: String)
基于旧架构和转换列表创建新架构的方法
def getExtendedSchema(schema: Schema, transforms: List[Transform]): Schema = {
var newSchema = SchemaBuilder
.builder(schema.getNamespace)
.record(schema.getName)
.fields()
// create new schema with existing fields from schemas and new fields which are created through transforms
val fields = schema.getFields ++ getNewFields(schema, transforms)
fields
.foldLeft(newSchema)((newSchema, field: Schema.Field) => {
newSchema
.name(field.name)
.`type`(field.schema())
.noDefault()
// TODO: find way to differentiate between explicitly set null defaults and fields which have no default
//.withDefault(field.defaultValue())
})
newSchema.endRecord()
}
def getNewFields(schema: Schema, transforms: List[Transform]): List[Schema.Field] = {
transforms
.filter { // only select targetFields which are not in schema
case FieldTransform(field, targetField, method) => schema.getField(targetField) == null
case _ => false
}
.distinct
.map { // create new Field object for each targetField
case FieldTransform(field, targetField, method) =>
val sourceField = schema.getField(field)
new Schema.Field(targetField, sourceField.schema(), sourceField.doc(), sourceField.defaultValue())
}
}
基于旧记录
实例化新的GenericRecord val extendedSchema = getExtendedSchema(row.getSchema, transforms)
val extendedRow = new GenericData.Record(extendedSchema)
for (field <- row.getSchema.getFields) {
extendedRow.put(field.name, row.get(field.name))
}
我试图寻找其他解决方案,但无法找到任何更改数据类型的示例。我觉得必须有一个更简单的清洁解决方案来处理运行时更改的Avro架构。任何想法都表示赞赏。
谢谢, 保罗
答案 0 :(得分:0)
我已经实现了将动态值传递到您的avro模式并验证到模式中的联合
示例:-
RestTemplate template = new RestTemplate();
HttpHeaders headers = new HttpHeaders();
headers.setContentType(MediaType.APPLICATION_JSON);
HttpEntity<String> entity = new HttpEntity<String>(headers);
ResponseEntity<String> response = template.exchange(""+registryUrl+"/subjects/"+topic+"/versions/"+version+"", HttpMethod.GET, entity, String.class);
String responseData = response.getBody();
JSONObject jsonObject = new JSONObject(responseData); // add your json string which you will pass from postman
JSONObject jsonObjectResult = new JSONObject(jsonResult);
String getData = jsonObject.get("schema").toString();
Schema.Parser parser = new Schema.Parser();
Schema schema = parser.parse(getData);
GenericRecord genericRecord = new GenericData.Record(schema);
schema.getFields().stream().forEach(field->{
genericRecord.put(field.name(),jsonObjectResult.get(field.name()));
});
GenericDatumReader<GenericRecord>reader = new GenericDatumReader<GenericRecord>(schema);
boolean data = reader.getData().validate(schema,genericRecord );