我是Flink的初学者。我可以实现Flink流。我还练习了有关KeyedStream的timeWindow。现在,我想使用Flink的有关DataStream的timeWindow来实现批处理操作。我尝试了一些代码,但这只是行不通。我不知道如何达到预期的效果。
val resultDataStream: DataStream[HBaseOperation] = env
.addSource(kafkaUserEventSource)
.assignTimestampsAndWatermarks(new Launcher.CustomWatermarkExtractor(Time.hours(24)))
.flatMap { line =>
import scala.collection.JavaConverters._
val tableInfo: mutable.Buffer[TableInfo] = JSON.parseArray(line.columnValueList, classOf[TableInfo]).asScala
line.eventType match {
case EventType.INSERT => tableInfo.map {
row => HBaseOperation(line.eventType, s"${line.dbName}.${line.tableName}", HBASE_CFNAME, tableInfo(0).columnValue, row.columnName, row.columnValue)
}
case EventType.UPDATE => tableInfo.filter(_.isValid).map {
row => HBaseOperation(line.eventType, s"${line.dbName}.${line.tableName}", HBASE_CFNAME, tableInfo(0).columnValue, row.columnName, row.columnValue)
}
case EventType.DELETE => List(HBaseOperation(line.eventType, s"${line.dbName}.${line.tableName}", HBASE_CFNAME, tableInfo(0).columnValue, null, null))
}
}
resultDataStream.print()//Here it works
//batch operations
val value: AllWindowedStream[HBaseOperation, TimeWindow] = resultDataStream.timeWindowAll(Time.seconds(5))
val resultDataStream2: DataStream[BatchHBaseOperation] = value.apply(new RichAllWindowFunction[HBaseOperation, BatchHBaseOperation, TimeWindow] {
override def apply(window: TimeWindow, input: Iterable[HBaseOperation], out: Collector[BatchHBaseOperation]): Unit = {
val ops: ListBuffer[HBaseOperation] = ListBuffer[HBaseOperation]()
input.foreach(op => ops += op)
out.collect(BatchHBaseOperation(ops.toList))
}
})//Doesn't work
resultDataStream2.print()
//resultDataStream2.addSink(HBaseUtil.putMapData(_))
env.execute("KafkaHBaseApp")
答案 0 :(得分:0)
您不清楚水印生成器的详细信息,但是如果它是基于Flink的有界乱序水印,则CustomWatermarkExtractor(Time.hours(24))
意味着直到第一次时才会触发时间窗口已处理24小时(加上5秒)的数据。这可以解释为什么它似乎不起作用。