如何为复杂的json文档定义avro架构?

时间:2015-01-27 04:24:18

标签: json serialization mapreduce avro

我有一个JSON文档,我想将其转换为Avro,并且需要为此目的指定模式。这是我想要定义avro架构的JSON文档:

{
 "uid": 29153333,
 "somefield": "somevalue",
 "options": [
   {
     "item1_lvl2": "a",
     "item2_lvl2": [
       {
         "item1_lvl3": "x1",
         "item2_lvl3": "y1"
       },
       {
         "item1_lvl3": "x2",
         "item2_lvl3": "y2"
       }
     ]
   }
 ]
}

我能够为非复杂类型定义模式,但不能为复杂的"选项定义#34;字段:

{
  "namespace" : "my.com.ns",
  "type" :  "record",
  "fields" : [
     {"name": "uid", "type": "int"},
     {"name": "somefield", "type": "string"}
     {"name": "options", "type": .....}
  ]
}

感谢您的帮助!

2 个答案:

答案 0 :(得分:15)

您需要使用Avro complex types,特别是arraysrecords。然后将它们嵌套在一起:

{
  "namespace" : "my.com.ns",
  "name": "myrecord",
  "type" :  "record",
  "fields" : [
     {"name": "uid", "type": "int"},
     {"name": "somefield", "type": "string"},
     {"name": "options", "type": {
        "type": "array",
        "items": {
            "type": "record",
            "name": "lvl2_record",
            "fields": [
                {"name": "item1_lvl2", "type": "string"},
                {"name": "item2_lvl2", "type": {
                    "type": "array",
                    "items": {
                        "type": "record",
                        "name": "lvl3_record",
                        "fields": [
                            {"name": "item1_lvl3", "type": "string"},
                            {"name": "item2_lvl3", "type": "string"}
                        ]
                    }
                }}
            ]
        }
     }}
  ]
}

另外,为了提高可读性,您可以split the schema into multiple files

答案 1 :(得分:4)

这个在线工具(http://avro4s-ui.landoop.com/)非常实用,您可以通过给定的有效json生成AVRO模式。