如何修复预期的启动联盟。在命令行上将JSON转换为Avro时获得了VALUE_NUMBER_INT?

时间:2014-12-15 13:50:07

标签: json validation avro

我尝试使用Avro架构验证JSON文件并编写相应的Avro文件。首先,我定义了以下名为user.avsc的Avro架构:

{"namespace": "example.avro",
 "type": "record",
 "name": "user",
 "fields": [
     {"name": "name", "type": "string"},
     {"name": "favorite_number",  "type": ["int", "null"]},
     {"name": "favorite_color", "type": ["string", "null"]}
 ]
}

然后创建了一个user.json文件:

{"name": "Alyssa", "favorite_number": 256, "favorite_color": null}

然后试图跑:

java -jar ~/bin/avro-tools-1.7.7.jar fromjson --schema-file user.avsc user.json > user.avro

但我得到以下例外:

Exception in thread "main" org.apache.avro.AvroTypeException: Expected start-union. Got VALUE_NUMBER_INT
    at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:697)
    at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:441)
    at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:290)
    at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
    at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:267)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:155)
    at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
    at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
    at org.apache.avro.tool.DataFileWriteTool.run(DataFileWriteTool.java:99)
    at org.apache.avro.tool.Main.run(Main.java:84)
    at org.apache.avro.tool.Main.main(Main.java:73)

我错过了什么吗?为什么我得到"预期的开始联盟。得到了VALUE_NUMBER_INT"。

4 个答案:

答案 0 :(得分:29)

根据the explanation by Doug Cutting

  

Avro的JSON编码要求标记非空联合值   与他们的预期类型。这是因为工会喜欢   ["字节","字符串"]和[" int"," long"]在JSON中是不明确的,第一个   都编码为JSON字符串,而第二个编码为   JSON号码。

http://avro.apache.org/docs/current/spec.html#json_encoding

  

因此,您的记录必须编码为:

{"name": "Alyssa", "favorite_number": {"int": 7}, "favorite_color": null}

答案 1 :(得分:10)

有一个新的JSON编码器正在解决这个常见问题:

https://issues.apache.org/jira/browse/AVRO-1582

https://github.com/zolyfarkas/avro

答案 2 :(得分:1)

我已经实现了union及其验证,只需创建一个union schema并通过postman传递其值。 resgistry url是您为kafka属性指定的url,您还可以将动态值传递给您的模式

RestTemplate template = new RestTemplate();
        HttpHeaders headers = new HttpHeaders();
        headers.setContentType(MediaType.APPLICATION_JSON);
        HttpEntity<String> entity = new HttpEntity<String>(headers);
        ResponseEntity<String> response = template.exchange(""+registryUrl+"/subjects/"+topic+"/versions/"+version+"", HttpMethod.GET, entity, String.class);
        String responseData = response.getBody();
        JSONObject jsonObject = new JSONObject(responseData);
        JSONObject jsonObjectResult = new JSONObject(jsonResult);
        String getData = jsonObject.get("schema").toString();
        Schema.Parser parser = new Schema.Parser();
        Schema schema = parser.parse(getData);
        GenericRecord genericRecord = new GenericData.Record(schema);
        schema.getFields().stream().forEach(field->{
            genericRecord.put(field.name(),jsonObjectResult.get(field.name()));
        });
        GenericDatumReader<GenericRecord>reader = new GenericDatumReader<GenericRecord>(schema);
        boolean data = reader.getData().validate(schema,genericRecord );

答案 3 :(得分:0)

正如@ Emre-Sevinc指出的那样,问题出在您的Avro记录的编码上。

在这里更具体;

不要这样做:

   jsonRecord = avroGenericRecord.toString

相反,请执行以下操作:

    val writer = new GenericDatumWriter[GenericRecord](avroSchema)
    val baos = new ByteArrayOutputStream
    val jsonEncoder = EncoderFactory.get.jsonEncoder(avroSchema, baos)
    writer.write(avroGenericRecord, jsonEncoder)
    jsonEncoder.flush

    val jsonRecord = baos.toString("UTF-8")

您还需要以下导入:

import org.apache.avro.Schema
import org.apache.avro.generic.{GenericData, GenericDatumReader, GenericDatumWriter, GenericRecord}
import org.apache.avro.io.{DecoderFactory, EncoderFactory}

执行完此操作后,您将获得jsonRecord,其中带有标有其预期类型的​​非空并集值。

希望这会有所帮助!