有人可以帮助我了解flink中的窗口(会话)何时以及如何发生吗?还是如何处理样品?
例如,如果有连续的事件流流入,则事件是应用程序中的请求和应用程序提供的响应。 作为flink处理的一部分,我们需要了解处理请求所花费的时间。
我知道有一些配置的时间间隔窗口,每隔n秒就会触发一次,一旦时间流逝,该时间窗口中的所有事件都会被汇总。
例如: 假设定义的时间窗口为30秒,并且如果一个事件在t时间到达,另一个事件在t + 30到达,则两个事件都将被处理,但是到达t + 31的事件将被忽略。
如果我说的不对,请纠正。
上面的问题是:如果说某个事件在t时间到达,而另一个事件在t + 3时间到达,它是否还要等待整整30秒来汇总并最终确定结果?
现在在会话窗口的情况下,这如何工作?如果事件是分别处理的,并且在反序列化时将代理时间戳用作单个事件的session_id,那么将为每个事件创建会话窗口吗?如果是,那么我们是否需要区别对待请求和响应事件,因为如果不这样做,那么响应事件就不会获得自己的会话窗口吗?
我将尝试在短时间内发布我正在玩的示例(以Java语言编写),但是以上几点的任何输入都会有所帮助!
DTO:
public class IncomingEvent{
private String id;
private String eventId;
private Date timestamp;
private String component;
//getters and setters
}
public class FinalOutPutEvent{
private String id;
private long timeTaken;
//getters and setters
}
================================================ 传入事件的反序列化:
公共类IncomingEventDeserializationScheme实现KafkaDeserializationSchema {
private ObjectMapper mapper;
public IncomingEventDeserializationScheme(ObjectMapper mapper) {
this.mapper = mapper;
}
@Override
public TypeInformation<IncomingEvent> getProducedType() {
return TypeInformation.of(IncomingEvent.class);
}
@Override
public boolean isEndOfStream(IncomingEvent nextElement) {
return false;
}
@Override
public IncomingEvent deserialize(ConsumerRecord<byte[], byte[]> record) throws Exception {
if (record.value() == null) {
return null;
}
try {
IncomingEvent event = mapper.readValue(record.value(), IncomingEvent.class);
if(event != null) {
new SessionWindow(record.timestamp());
event.setOffset(record.offset());
event.setTopic(record.topic());
event.setPartition(record.partition());
event.setBrokerTimestamp(record.timestamp());
}
return event;
} catch (Exception e) {
return null;
}
}
}
================================================
public class MyEventJob {
private static final ObjectMapper mapper = new ObjectMapper();
public static void main(String[] args) throws Exception {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
MyEventJob eventJob = new MyEventJob();
InputStream inStream = eventJob.getFileFromResources("myConfig.properties");
ParameterTool parameter = ParameterTool.fromPropertiesFile(inStream);
Properties properties = parameter.getProperties();
Integer timePeriodBetweenEvents = 120;
String outWardTopicHostedOnServer = localhost:9092";
DataStreamSource<IncomingEvent> stream = env.addSource(new FlinkKafkaConsumer<>("my-input-topic", new IncomingEventDeserializationScheme(mapper), properties));
SingleOutputStreamOperator<IncomingEvent> filteredStream = stream
.assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor<IncomingEvent>() {
long eventTime;
@Override
public long extractTimestamp(IncomingEvent element, long previousElementTimestamp) {
return element.getTimestamp();
}
@Override
public Watermark getCurrentWatermark() {
return new Watermark(eventTime);
}
})
.map(e -> { e.setId(e.getEventId()); return e; });
SingleOutputStreamOperator<FinalOutPutEvent> correlatedStream = filteredStream
.keyBy(new KeySelector<IncomingEvent, String> (){
@Override
public String getKey(@Nonnull IncomingEvent input) throws Exception {
return input.getId();
}
})
.window(GlobalWindows.create()).allowedLateness(Time.seconds(defaultSliceTimePeriod))
.trigger( new Trigger<IncomingEvent, Window> (){
private final long sessionTimeOut;
public SessionTrigger(long sessionTimeOut) {
this.sessionTimeOut = sessionTimeOut;
}
@Override
public TriggerResult onElement(IncomingEvent element, long timestamp, Window window, TriggerContext ctx)
throws Exception {
ctx.registerProcessingTimeTimer(timestamp + sessionTimeOut);
return TriggerResult.CONTINUE;
}
@Override
public TriggerResult onProcessingTime(long time, Window window, TriggerContext ctx) throws Exception {
return TriggerResult.FIRE_AND_PURGE;
}
@Override
public TriggerResult onEventTime(long time, Window window, TriggerContext ctx) throws Exception {
return TriggerResult.CONTINUE;
}
@Override
public void clear(Window window, TriggerContext ctx) throws Exception {
//check the clear method implementation
}
})
.process(new ProcessWindowFunction<IncomingEvent, FinalOutPutEvent, String, SessionWindow>() {
@Override
public void process(String arg0,
ProcessWindowFunction<IncomingEvent, FinalOutPutEvent, String, SessionWindow>.Context arg1,
Iterable<IncomingEvent> input, Collector<FinalOutPutEvent> out) throws Exception {
List<IncomingEvent> eventsIn = new ArrayList<>();
input.forEach(eventsIn::add);
if(eventsIn.size() == 1) {
//Logic to handle incomplete request/response events
} else if (eventsIn.size() == 2) {
//Logic to handle the complete request/response and how much time it took
}
}
} );
FlinkKafkaProducer<FinalOutPutEvent> kafkaProducer = new FlinkKafkaProducer<>(
outWardTopicHostedOnServer, // broker list
"target-topic", // target topic
new EventSerializationScheme(mapper));
correlatedStream.addSink(kafkaProducer);
env.execute("Streaming");
}
}
谢谢 薇姬
答案 0 :(得分:2)
根据您的描述,我认为您想编写一个以session_id
为键的自定义ProcessFunction。您将拥有一个ValueState
,用于存储请求事件的时间戳。当您获得相应的响应事件时,您将计算增量并发出增量(使用session_id
)并清除状态。
您可能还希望在收到请求事件时设置一个计时器,这样,如果您在安全/长时间内没有收到响应事件,则可以发出失败请求的侧面输出。
答案 1 :(得分:0)
因此,使用默认触发器,每个窗口将在时间完全过去后完成。取决于您使用的是EventTime
还是ProcessingTime
,这可能意味着不同的意思,但是通常,Flink将始终等待窗口关闭,然后再对其进行完全处理。就您而言,t + 31处的事件只会转到另一个窗口。
对于会话窗口,它们也是窗口,这意味着最后它们只是聚合样本,这些样本之间的时间戳之间的差小于定义的间隔。在内部,这比普通窗口要复杂得多,因为它们没有定义开始和结束。会话窗口运算符获取示例并为每个单独的示例创建一个新的窗口。然后,操作员验证是否可以将新创建的窗口与已经存在的窗口合并(即,它们的时间戳是否比间隔小)并合并它们。最终,结果是窗口的所有元素的时间戳彼此之间的距离都比已定义的间隔更短。
答案 2 :(得分:0)
您正在使它变得比所需的更加复杂。下面的示例需要一些调整,但是希望可以传达出如何使用KeyedProcessFunction
而不是会话窗口的想法。
此外,BoundedOutOfOrdernessTimestampExtractor
的构造函数希望传递一个Time maxOutOfOrderness
。不确定为什么要使用忽略getCurrentWatermark
的实现来覆盖其maxOutOfOrderness
方法。
public static void main(String[] args) throws Exception {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<Event> events = ...
events
.assignTimestampsAndWatermarks(new TimestampsAndWatermarks(OUT_OF_ORDERNESS))
.keyBy(e -> e.sessionId)
.process(new RequestReponse())
...
}
public static class RequestReponse extends KeyedProcessFunction<KEY, Event, Long> {
private ValueState<Long> requestTimeState;
@Override
public void open(Configuration config) {
ValueStateDescriptor<Event> descriptor = new ValueStateDescriptor<>(
"request time", Long.class);
requestState = getRuntimeContext().getState(descriptor);
}
@Override
public void processElement(Event event, Context context, Collector<Long> out) throws Exception {
TimerService timerService = context.timerService();
Long requestedAt = requestTimeState.value();
if (requestedAt == null) {
// haven't seen the request before; save its timestamp
requestTimeState.update(event.timestamp);
timerService.registerEventTimeTimer(event.timestamp + TIMEOUT);
} else {
// this event is the response
// emit the time elapsed between request and response
out.collect(event.timestamp - requestedAt);
}
}
@Override
public void onTimer(long timestamp, OnTimerContext context, Collector<Long> out) throws Exception {
//handle incomplete request/response events
}
}
public static class TimestampsAndWatermarks extends BoundedOutOfOrdernessTimestampExtractor<Event> {
public TimestampsAndWatermarks(Time t) {
super(t);
}
@Override
public long extractTimestamp(Event event) {
return event.eventTime;
}
}