EventTime窗口未触发

时间:2018-06-14 20:30:22

标签: scala apache-flink flink-streaming

我正在尝试使用EventTime设置流式传输解决方案。我将从卡夫卡消费。我的数据可能会延迟一分钟(因此我的WaterMark会延迟一分钟)并且无法正常运行。我有30秒的窗户。我有以下设置:

KafkaConsumer.scala

object KafkaConsumer {
  def main(args: Array[String]) {
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
    val properties = getServerProperties
    val consumer = new FlinkKafkaConsumer010[ObjectNode](getKafkaTopic, new JsonNodeDeserializationSchema, properties)
    consumer.setStartFromGroupOffsets()

    val stream = env.addSource(consumer)
      .assignTimestampsAndWatermarks(new WMAssigner)

    stream
      .keyBy { jsonEvent =>
        val key = jsonEvent.findValue("key").toString replaceAll("\"","")
        key.toString
      }
      .window(TumblingEventTimeWindows.of(Time.seconds(30)))
      .process { new SessionWindowProcessor }
      .print
    env.execute("EventTime Test")
  }
}

WMAssigner.scala

class WMAssigner extends AssignerWithPeriodicWatermarks[ObjectNode] {

  var currentMaxTimestamp: Long = 0
  var currentWaterMark: Long = 0

  override def extractTimestamp(element: ObjectNode, previousElementTimestamp: Long): Long = {
    val lStr = element.findValue("ts").toString replaceAll("\"", "")
    currentMaxTimestamp = if(currentMaxTimestamp > lStr.toLong) currentMaxTimestamp else lStr.toLong
    currentMaxTimestamp
  }

  override def getCurrentWatermark: Watermark = {
      currentWaterMark = if(currentMaxTimestamp - 60000 < 0) 0 else currentMaxTimestamp - 60000
      new Watermark(currentWaterMark)

  }
}

SessionWindowProcessor.scala

class SessionWindowProcessor extends ProcessWindowFunction[ObjectNode,Long,String,TimeWindow] {
  override def process(key: String, context: Context, elements: Iterable[ObjectNode], out: Collector[Long]): Unit = {
    println("Processing!")
    var maxVal: Long = 0
    elements.foreach( value => {
      val valStr = value.findValue("value").toString replaceAll("\"","")
      maxVal = if(valStr.toLong > maxVal) valStr.toLong else maxVal
    })
    out.collect(maxVal)
  }
}

示例数据:

"{\"ts\": 0,     \"key\": \"a\", \"value\": 0}",
"{\"ts\": 15000, \"key\": \"a\", \"value\": 1}",
"{\"ts\": 30000, \"key\": \"a\", \"value\": 2}",
"{\"ts\": 90001, \"key\": \"a\", \"value\": 3}" 

我希望在第四条消息出现后,我的第一个窗口会触发,值为12(我不确定包含是如何工作的)。不幸的是,我甚至没有看到println SessionWindowProcessor.scala中的import。我在这里做错了什么导致我的窗口失败了?

2 个答案:

答案 0 :(得分:2)

您的Kafka主题有多少个分区?如果它有多个分区,则问题是Flink必须为每个分区查看超过import 'dart:async'; import 'package:flutter/material.dart'; import 'package:cloud_firestore/cloud_firestore.dart'; import 'package:flutter_search_bar/flutter_search_bar.dart'; SearchBar searchBar; GlobalKey<ScaffoldState> _scaffoldKey = new GlobalKey<ScaffoldState>(); class DisplayCourse extends StatefulWidget { @override _DisplayCourseState createState() => new _DisplayCourseState(); } AppBar _buildAppBar(BuildContext context) { return new AppBar( title: new Text("FIREBASE QUERY"), centerTitle: true, actions: <Widget>[ searchBar.getSearchAction(context), ], ); } class _DisplayCourseState extends State<DisplayCourse> { String _queryText; _DisplayCourseState() { searchBar = new SearchBar( onSubmitted: onSubmitted, inBar: true, buildDefaultAppBar: _buildAppBar, setState: setState, ); } void onSubmitted(String value) { setState(() { _queryText = value; _scaffoldKey.currentState.showSnackBar(new SnackBar( content: new Text('You have Searched something!'), backgroundColor: Colors.yellow, )); }); } @override Widget build(BuildContext context) { return new Scaffold( key: _scaffoldKey, appBar: searchBar.build(context), backgroundColor: Colors.red, body: _fireSearch(_queryText), ); } } Widget _fireSearch(String queryText) { return new StreamBuilder( stream: Firestore.instance .collection('courses') .where('title', isEqualTo: queryText) .snapshots(), builder: (context, snapshot) { if (!snapshot.hasData) return new Text('Loading...'); return new ListView.builder( itemCount: snapshot.data.documents.length, itemBuilder: (context, index) => _buildListItem(snapshot.data.documents[index]), ); }, ); } Widget _buildListItem(DocumentSnapshot document) { return new ListTile( title: document['title'], subtitle: document['subtitle'], ); } 的时间戳,以便发出值为90000的水印。因此,您必须添加更多数据,以便每个分区都有一个时间戳大于30000的元素,或者将Kafka主题的分区数设置为90000。以下是有关watermarks and the Kafka connector的更多信息。

答案 1 :(得分:2)

对于它的价值,你的extractTimestamp()方法应该返回lStr.toLong,而不是currentMaxTimestamp。有意义的是,此方法跟踪currentMaxTimestamp,以便该值可用于生成水印,但extractTimestamp的作用是为Flink提供每个流元素的事件时间时间戳。

但是,我不相信这就解释了为什么你没有看到任何输出 - 当然,如果你的数据实际按时间戳顺序排列,就像在样本中一样。

另请注意,您可能正在使用BoundedOutOfOrdernessTimestampExtractor,这可以更简单地使用。