关于Flink的DataStream timeWindow有一个问题[帮助]

时间:2019-09-08 16:16:09

标签: scala apache-flink

我是Flink的初学者。我可以实现Flink流。我还练习了有关KeyedStream的timeWindow。现在,我想使用Flink的有关DataStream的timeWindow来实现批处理操作。我尝试了一些代码,但这只是行不通。我不知道如何达到预期的效果。

val resultDataStream: DataStream[HBaseOperation] = env
  .addSource(kafkaUserEventSource)
  .assignTimestampsAndWatermarks(new Launcher.CustomWatermarkExtractor(Time.hours(24)))
  .flatMap { line =>
    import scala.collection.JavaConverters._
    val tableInfo: mutable.Buffer[TableInfo] = JSON.parseArray(line.columnValueList, classOf[TableInfo]).asScala
    line.eventType match {
      case EventType.INSERT => tableInfo.map {
        row => HBaseOperation(line.eventType, s"${line.dbName}.${line.tableName}", HBASE_CFNAME, tableInfo(0).columnValue, row.columnName, row.columnValue)
      }
      case EventType.UPDATE => tableInfo.filter(_.isValid).map {
        row => HBaseOperation(line.eventType, s"${line.dbName}.${line.tableName}", HBASE_CFNAME, tableInfo(0).columnValue, row.columnName, row.columnValue)
      }
      case EventType.DELETE => List(HBaseOperation(line.eventType, s"${line.dbName}.${line.tableName}", HBASE_CFNAME, tableInfo(0).columnValue, null, null))
    }
  }
resultDataStream.print()//Here it works

//batch operations
val value: AllWindowedStream[HBaseOperation, TimeWindow] = resultDataStream.timeWindowAll(Time.seconds(5))
val resultDataStream2: DataStream[BatchHBaseOperation] = value.apply(new RichAllWindowFunction[HBaseOperation, BatchHBaseOperation, TimeWindow] {
  override def apply(window: TimeWindow, input: Iterable[HBaseOperation], out: Collector[BatchHBaseOperation]): Unit = {
    val ops: ListBuffer[HBaseOperation] = ListBuffer[HBaseOperation]()
    input.foreach(op => ops += op)
    out.collect(BatchHBaseOperation(ops.toList))
  }
})//Doesn't work
resultDataStream2.print()
//resultDataStream2.addSink(HBaseUtil.putMapData(_))
env.execute("KafkaHBaseApp")

1 个答案:

答案 0 :(得分:0)

您不清楚水印生成器的详细信息,但是如果它是基于Flink的有界乱序水印,则CustomWatermarkExtractor(Time.hours(24))意味着直到第一次时才会触发时间窗口已处理24小时(加上5秒)的数据。这可以解释为什么它似乎不起作用。