我使用NiFi Flow作为ListFile>> FetchFile>> SplitJson>> UpdateAttribute>> FlattenJson>> InferAvroSchema>> ConvertRecord>> MergeRecord>> PutParquet。
Json输入:
[{
"Id": 1235,
"Username": "fred1235",
"Name": "Fred",
"ShippingAddress": {
"Address1": "456 Main St.",
"Address2": "",
"City": "Durham",
"State": "NC"
}
},{
"Id": 1236,
"Username": "larry1234",
"Name": "Larry",
"ShippingAddress": {
"Address1": "789 Main St.",
"Address2": "",
"City": "Durham",
"State": "NC",
"PostalCode": 277453
},
"Orders": [{
"ItemId": 1111,
"OrderDate": "11/11/2012"
}, {
"ItemId": 2222,
"OrderDate": "12/12/2012"
}]
}]
MergeRecord处理器没有给出“Orders”:合并文件架构中的数组。与MergeContent处理器相同的问题。
答案 0 :(得分:1)
不是使用SplitJson和FlattenJson,而是可以使用JoltTransformJSON和以下ChainR规范来展平整个事物而不进行拆分:
[
{
"operation": "shift",
"spec": {
"*": {
"ShippingAddress": {
"Address1": "[&2].ShippingAddress_Address1",
"Address2": "[&2].ShippingAddress_Address2",
"City": "[&2].ShippingAddress_City",
"State": "[&2].ShippingAddress_State"
},
"Orders": {
"*": {
"ItemId": "[&3].Orders_&1_ItemId",
"OrderDate": "[&3].Orders_&1_OrderDate"
}
},
"*": "[&1].&"
}
}
}
]
不确定ConvertRecord的用途,但您不再需要MergeRecord。如果这不是您正在寻找的输出,请告诉我您的期望(对于两个记录,有和没有订单字段的记录),我很乐意提供帮助。