我有一个Datastream [ObjectNode],从kafka主题中读取为反序列化json。此ObjectNode的元素之一实际上是事件数组。该阵列具有不同的长度。传入的json流看起来像这样:
{
"eventType": "Impression",
"deviceId": "359849094258487",
"payload": {
"vertical_name": "",
"promo_layout_type": "aa",
"Customer_Id": "1011851",
"ecommerce": {
"promoView": {
"promotions": [{
"name": "/-category_icons_all",
"id": "300275",
"position": "slot_5_1",
"creative": "Central/Gift Card/00000001B890D1739913DDA956AB5C79775991EC"
}, {
"name": "/-category_icons_all",
"id": "300276",
"position": "slot_6_1",
"creative": "Lifestyle/Gift Card/00000001B890D1739913DDA956AB5C79775991EC"
}, {
"name": "/-category_icons_all",
"id": "413002",
"position": "slot_7_1",
"creative": "Uber/Deals/00000001B890D1739913DDA956AB5C79775991EC"
}]
}
}
}
}
我希望能够爆炸 promotions
数组,以便其中的每个元素成为一条单独的消息,可以将其写入接收器kafka主题。 Flink是否在DataStream和/或Table API中提供爆炸功能?
我已经尝试在此流上执行RichFlatMap以便能够收集单个行,但这也只向我返回了DataStream [Seq [GenericRecord]],如下所示:
class PromoMapper(schema: Schema) extends RichFlatMapFunction[node.ObjectNode,Seq[GenericRecord]] {
override def flatMap(value: ObjectNode, out: Collector[Seq[GenericRecord]]): Unit = {
val promos = value.get("payload").get("ecommerce").get("promoView").get("promotions").asInstanceOf[Seq[node.ObjectNode]]
val record = for{promo <- promos} yield {
val processedRecord: GenericData.Record = new GenericData.Record(schema)
promo.fieldNames().asScala.foreach(f => processedRecord.put(f,promo.get(f)))
processedRecord
}
out.collect(record)
}
}
请帮助。
答案 0 :(得分:0)
使用平面图是正确的主意(不知道为什么要打扰RichFlatMap,但这是一个细节)。
似乎您应该为for循环内的每个元素调用out.collect(processedRecord)
,而不是对该循环产生的Seq调用一次。