我正在尝试使用EventTime
设置流式传输解决方案。我将从卡夫卡消费。我的数据可能会延迟一分钟(因此我的WaterMark会延迟一分钟)并且无法正常运行。我有30秒的窗户。我有以下设置:
KafkaConsumer.scala
object KafkaConsumer {
def main(args: Array[String]) {
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
val properties = getServerProperties
val consumer = new FlinkKafkaConsumer010[ObjectNode](getKafkaTopic, new JsonNodeDeserializationSchema, properties)
consumer.setStartFromGroupOffsets()
val stream = env.addSource(consumer)
.assignTimestampsAndWatermarks(new WMAssigner)
stream
.keyBy { jsonEvent =>
val key = jsonEvent.findValue("key").toString replaceAll("\"","")
key.toString
}
.window(TumblingEventTimeWindows.of(Time.seconds(30)))
.process { new SessionWindowProcessor }
.print
env.execute("EventTime Test")
}
}
WMAssigner.scala
class WMAssigner extends AssignerWithPeriodicWatermarks[ObjectNode] {
var currentMaxTimestamp: Long = 0
var currentWaterMark: Long = 0
override def extractTimestamp(element: ObjectNode, previousElementTimestamp: Long): Long = {
val lStr = element.findValue("ts").toString replaceAll("\"", "")
currentMaxTimestamp = if(currentMaxTimestamp > lStr.toLong) currentMaxTimestamp else lStr.toLong
currentMaxTimestamp
}
override def getCurrentWatermark: Watermark = {
currentWaterMark = if(currentMaxTimestamp - 60000 < 0) 0 else currentMaxTimestamp - 60000
new Watermark(currentWaterMark)
}
}
SessionWindowProcessor.scala
class SessionWindowProcessor extends ProcessWindowFunction[ObjectNode,Long,String,TimeWindow] {
override def process(key: String, context: Context, elements: Iterable[ObjectNode], out: Collector[Long]): Unit = {
println("Processing!")
var maxVal: Long = 0
elements.foreach( value => {
val valStr = value.findValue("value").toString replaceAll("\"","")
maxVal = if(valStr.toLong > maxVal) valStr.toLong else maxVal
})
out.collect(maxVal)
}
}
示例数据:
"{\"ts\": 0, \"key\": \"a\", \"value\": 0}",
"{\"ts\": 15000, \"key\": \"a\", \"value\": 1}",
"{\"ts\": 30000, \"key\": \"a\", \"value\": 2}",
"{\"ts\": 90001, \"key\": \"a\", \"value\": 3}"
我希望在第四条消息出现后,我的第一个窗口会触发,值为1
或2
(我不确定包含是如何工作的)。不幸的是,我甚至没有看到println
SessionWindowProcessor.scala
中的import
。我在这里做错了什么导致我的窗口失败了?
答案 0 :(得分:2)
您的Kafka主题有多少个分区?如果它有多个分区,则问题是Flink必须为每个分区查看超过import 'dart:async';
import 'package:flutter/material.dart';
import 'package:cloud_firestore/cloud_firestore.dart';
import 'package:flutter_search_bar/flutter_search_bar.dart';
SearchBar searchBar;
GlobalKey<ScaffoldState> _scaffoldKey = new GlobalKey<ScaffoldState>();
class DisplayCourse extends StatefulWidget {
@override
_DisplayCourseState createState() => new _DisplayCourseState();
}
AppBar _buildAppBar(BuildContext context) {
return new AppBar(
title: new Text("FIREBASE QUERY"),
centerTitle: true,
actions: <Widget>[
searchBar.getSearchAction(context),
],
);
}
class _DisplayCourseState extends State<DisplayCourse> {
String _queryText;
_DisplayCourseState() {
searchBar = new SearchBar(
onSubmitted: onSubmitted,
inBar: true,
buildDefaultAppBar: _buildAppBar,
setState: setState,
);
}
void onSubmitted(String value) {
setState(() {
_queryText = value;
_scaffoldKey.currentState.showSnackBar(new SnackBar(
content: new Text('You have Searched something!'),
backgroundColor: Colors.yellow,
));
});
}
@override
Widget build(BuildContext context) {
return new Scaffold(
key: _scaffoldKey,
appBar: searchBar.build(context),
backgroundColor: Colors.red,
body: _fireSearch(_queryText),
);
}
}
Widget _fireSearch(String queryText) {
return new StreamBuilder(
stream: Firestore.instance
.collection('courses')
.where('title', isEqualTo: queryText)
.snapshots(),
builder: (context, snapshot) {
if (!snapshot.hasData) return new Text('Loading...');
return new ListView.builder(
itemCount: snapshot.data.documents.length,
itemBuilder: (context, index) =>
_buildListItem(snapshot.data.documents[index]),
);
},
);
}
Widget _buildListItem(DocumentSnapshot document) {
return new ListTile(
title: document['title'],
subtitle: document['subtitle'],
);
}
的时间戳,以便发出值为90000
的水印。因此,您必须添加更多数据,以便每个分区都有一个时间戳大于30000
的元素,或者将Kafka主题的分区数设置为90000
。以下是有关watermarks and the Kafka connector的更多信息。
答案 1 :(得分:2)
对于它的价值,你的extractTimestamp()
方法应该返回lStr.toLong,而不是currentMaxTimestamp。有意义的是,此方法跟踪currentMaxTimestamp,以便该值可用于生成水印,但extractTimestamp的作用是为Flink提供每个流元素的事件时间时间戳。
但是,我不相信这就解释了为什么你没有看到任何输出 - 当然,如果你的数据实际按时间戳顺序排列,就像在样本中一样。
另请注意,您可能正在使用BoundedOutOfOrdernessTimestampExtractor,这可以更简单地使用。