当找不到对中的早期匹配事件时,Flink会发出事件

时间:2016-12-03 15:00:52

标签: apache-flink flink-streaming flink-cep

我有两个事件流:一个发出一个事件来表示一个项目的生命周期的开始,另一个流发出一个事件来表示一个项目的生命周期结束。 (流可以在itemId上加入。)

我如何在Flink中为每个只有 生命周期结束的itemId1发出新事件"事件,而不是相应的开始? (这些开始和结束事件可能相隔数小时或数天。)

1 个答案:

答案 0 :(得分:1)

您可以在FlatMapFunction上使用有状态KeyedStream来实现该功能。

以下代码段应该可以满足您的需求。

val stream1: DataStream[Event1] = ???
val stream2: DataStream[Event2] = ???

// map both streams to their ID and a isStart flag to have a common type
val ids1: DataStream[(Int, Boolean)] = stream1.map(e => (e.id, true) )
val ids2: DataStream[(Int, Boolean)] = stream2.map(e => (e.id, false) )

// union both streams
val ids = ids1.union(ids2)

// use a stateful FlatMapFunction to check 
val onlyEOL: DataStream[Int] = ids
  // organize stream by ID
  .keyBy(_._1)
  // use stateful FlatMapFunction to check that bol arrived before eol
  .flatMapWithState { 
    (value: (Int, Boolean), state: Option[Boolean]) =>
      if (value._2) {
        // bol event -> emit nothing and set state to true
        ( List(), Some(true))
      } else {
        // eol event
        if (state.isDefined && state.get) {
          // bol was seen before -> emit nothing and remove state
          ( List(), None) 
        } else {
          // bol was NOT seen before -> emit ID and remove state
          ( List(value._1), None)   
        }
      }
  }