用于将数据帧写入.avro文件的Avro模式-Spark / Scala

时间:2020-09-14 10:36:39

标签: avro spark-avro

我在Avro Schema下面使用spark / scala将数据帧写入.avro文件。

{ "type": "record", "name": "userid", "namespace": "abc.com",
 "fields": [
    { "name": "userid", "type": "string" },
    { "name": "bag", "type":
      { "type": "array", "items":
        { "name": "tuple", "type": "record",
          "fields": [
          { "name": "key1", "type": "int" },
          { "name": "key2", "type": "int" },
          { "name": "key3", "type": "int" },
          { "name": "ts", "type": "long" } ]
        }
      }
    } ]
}

它生成的输出如下:

{"userid":{"string":"A991Yh3PLY9yTr"},
"bag":{"array":[
    {".bag.bag":{"key1":{"int":42304},"key2":{"int":2707},"key3":{"int":58008},"ts":{"long":1597158494}}},
    {".bag.bag":{"key1":{"int":42308},"key2":{"int":1774},"key3":{"int":195834},"ts":{"long":1597158596}}}
]}},
{"userid":{"string":"M891Bh7PLY9yNr"},
"bag":{"array":[
    {".bag.bag":{"key1":{"int":52304},"key2":{"int":5707},"key3":{"int":28008},"ts":{"long":1597158594}}},
    {".bag.bag":{"key1":{"int":52308},"key2":{"int":5774},"key3":{"int":295834},"ts":{"long":1597158664}}}
]}}

但是我对输出的期望如下:

{"userid":"A991Yh3PLY9yTr","bag":[{"key1":42304,"key2":2707,"key3":58008,"ts":1597158494},{"key1":42308,"key2":1774,"key3":195834,"ts":1597158596}]},
{"userid":"M891Bh7PLY9yNr","bag":[{"key1":52304,"key2":5707,"key3":28008,"ts":1597158594},{"key1":52308,"key2":5774,"key3":295834,"ts":1597158664}]}

输入数据框架构: enter image description here

有人可以建议我在AvroSchema中进行哪些更改才能获得预期的输出结果

0 个答案:

没有答案