如何在Flink中的两个不同的Kafka Streams上应用相同的模式?

时间:2017-07-10 22:49:24

标签: scala apache-flink

我在下面有这个Flink程序:

object WindowedWordCount {
  val configFactory = ConfigFactory.load()

  def main(args: Array[String]) = {
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)

    val kafkaStream1 = env.addSource(new FlinkKafkaConsumer010[String](topic1, new SimpleStringSchema(), props))
      .assignTimestampsAndWatermarks(new TimestampExtractor)

    val kafkaStream2 = env.addSource(new FlinkKafkaConsumer010[String](topic2, new SimpleStringSchema(), props))
      .assignTimestampsAndWatermarks(new TimestampExtractor)

    val partitionedStream1 = kafkaStream1.keyBy(jsonString => {
      extractUserId(jsonString)
    })

    val partitionedStream2 = kafkaStream2.keyBy(jsonString => {
      extractUserId(jsonString)
    })

    //Is there a way to match the userId from partitionedStream1 and partitionedStream2 in this same pattern?
    val patternForMatchingUserId = Pattern.begin[String]("start")
        .where(stream1.getUserId() == stream2.getUserId()) //I want to do something like this

    //Is there a way to pass in partitionedStream1 and partitionedStream2 to this CEP.pattern function?
    val patternStream = CEP.pattern(partitionedStream1, patternForMatchingUserId)

    env.execute()
  }
}

在上面的flink程序中,我有两个名为partitionedStream1partitionedStream2的流,它们是keyedBy用户ID。

我想以某种方式比较patternForMatchingUserId模式中两个流的数据(类似于我上面所示)。有没有办法将两个流传递给CEP.Pattern函数?

这样的事情:

val patternStream = CEP.pattern(partitionedStream1, partitionedStream2, patternForMatchingUserId)

1 个答案:

答案 0 :(得分:2)

您无法将两个流传递给CEP,但您可以传递合并的流。

如果两个流具有相同的类型/模式。你可以结合他们。我相信这个解决方案符合你的情况。

partitionedStream1.union(partitionedStream2).keyBy(...)

如果他们有不同的架构。您可以使用内部的一些自定义逻辑将它们转换为一个流。 coFlatMap