如何将带有json数组的Datastream分解为单个数组元素的DataStream

时间:2019-05-29 18:25:48

标签: json apache-kafka apache-flink flink-sql

我有一个Datastream [ObjectNode],从kafka主题中读取为反序列化json。此ObjectNode的元素之一实际上是事件数组。该阵列具有不同的长度。传入的json流看起来像这样:

{
    "eventType": "Impression",
    "deviceId": "359849094258487",
    "payload": {
        "vertical_name": "",
        "promo_layout_type": "aa",
        "Customer_Id": "1011851",
        "ecommerce": {
            "promoView": {
                "promotions": [{
                    "name": "/-category_icons_all",
                    "id": "300275",
                    "position": "slot_5_1",
                    "creative": "Central/Gift Card/00000001B890D1739913DDA956AB5C79775991EC"
                }, {
                    "name": "/-category_icons_all",
                    "id": "300276",
                    "position": "slot_6_1",
                    "creative": "Lifestyle/Gift Card/00000001B890D1739913DDA956AB5C79775991EC"
                }, {
                    "name": "/-category_icons_all",
                    "id": "413002",
                    "position": "slot_7_1",
                    "creative": "Uber/Deals/00000001B890D1739913DDA956AB5C79775991EC"
                }]
            }
        }
    }
}

我希望能够爆炸 promotions数组,以便其中的每个元素成为一条单独的消息,可以将其写入接收器kafka主题。 Flink是否在DataStream和/或Table API中提供爆炸功能?

我已经尝试在此流上执行RichFlatMap以便能够收集单个行,但这也只向我返回了DataStream [Seq [GenericRecord]],如下所示:

class PromoMapper(schema: Schema) extends RichFlatMapFunction[node.ObjectNode,Seq[GenericRecord]] {

  override def flatMap(value: ObjectNode, out: Collector[Seq[GenericRecord]]): Unit = {
    val promos = value.get("payload").get("ecommerce").get("promoView").get("promotions").asInstanceOf[Seq[node.ObjectNode]]

    val record = for{promo <- promos} yield {
      val processedRecord: GenericData.Record = new GenericData.Record(schema)
      promo.fieldNames().asScala.foreach(f => processedRecord.put(f,promo.get(f)))
      processedRecord
    }

    out.collect(record)
  }
}

请帮助。

1 个答案:

答案 0 :(得分:0)

使用平面图是正确的主意(不知道为什么要打扰RichFlatMap,但这是一个细节)。

似乎您应该为for循环内的每个元素调用out.collect(processedRecord),而不是对该循环产生的Seq调用一次。