我们正在使用Kafka流(0.10.2.0)来汇总相关事件。我们正在使用SessionWindows来聚合事件。聚合似乎不会在50%的时间内发生。 这是场景:
请求-1:所有事件都已成功聚合
GroupID:abc
活动: E1; EVENTTIME = 2017-05-31T14:36:56.653Z E2; EVENTTIME = 2017-05-31T14:36:56.653Z E3; EVENTTIME = 2017-05-31T14:36:56.653Z
请求-2:没有聚合事件
GroupID:efg
活动: E1; EVENTTIME = 2017-05-31T14:36:56.653Z E2; EVENTTIME = 2017-05-31T14:36:56.653Z E3; EVENTTIME = 2017-05-31T14:36:56.653Z
流式信息:
TimestampExtractor:Stream使用事件时间提取器将事件分组到窗口。
窗口类型:会话窗口。
Windows不活动时间= 2分钟
流配置:
CACHE_MAX_BYTES_BUFFERING_CONFIG = 0
TIMESTAMP_EXTRACTOR_CLASS_CONFIG = EventTimeExtractorImpl
代码段:
KStreamBuilder builder = new KStreamBuilder();
final KStream<String, GenericRecord> events = builder.stream(_appProperties.collationSourceTopic);
events.print();
KGroupedStream<String, GenericRecord> groupedStream = events.groupByKey(Serdes.String(), GenericCdsSerde.GenericCdsSerde());
SessionWindows tmpSessionWindows = SessionWindows.with(TimeUnit.MINUTES.toMillis(Long.parseLong(_appProperties.collationWindowInMins)));
KTable<Windowed<String>, List<GenericRecord>> sessionizedAggregatedStream = groupedStream.aggregate(
ArrayList::new,
(aggKey, newValue, aggValue) -> {
try {
aggValue.add(newValue);
} catch (Exception e) {
logger.error("failed aggr session windows", e);
return null;
}
return aggValue;
},
(aggKey, leftAggValue, rightAggValue) -> {
try {
leftAggValue.addAll(rightAggValue);
} catch (Exception e) {
logger.error("failed merging session windows", e);
return null;
}
return leftAggValue;
},
tmpSessionWindows, /* session window */
GenericListCdsSerde.GenericListCdsSerde(),
"session-store";
sessionizedAggregatedStream.print();
sessionizedAggregatedStream.toStream().foreach((stringWindowed, s) ->
logger.info("WindowedTable: window: " + stringWindowed.key()
+ "start ==> " + ((SessionWindow)stringWindowed.window()).start()
+ " end ==> " + stringWindowed.window().end()
+ " windowedValue: " + s));
记录成功分组的事件: GroupID:ng28
活动 - 1次抵达:
2017-06-01 09:57:23,861 INFO StreamThread-1 csccsEventTimeExtractor:49 - =======读取groupID:ng28 eventID:1 eventTime:2017-05-31T14:36:56.653Z
[KSTREAM-SOURCE-0000000000]:ng28,{&#34; eventID&#34;:&#34; 1&#34;,&#34; groupID&#34;:&#34; ng28&#34;, &#34; eventTime&#34;:&#34; 2017-05-31T14:36:56.653Z&#34;}
[KSTREAM-AGGREGATE-0000000003]:[ng28 @ 1496241416653],([{&#34; eventID&#34;:&#34; 1&#34;,&#34; groupID&#34;:&#34 ; ng28&#34;,&#34; eventTime&#34;:&#34; 2017-05-31T14:36:56.653Z&#34;}]&lt; -null)
2017-06-01 09:57:23,864 INFO StreamThread-1 c.s.c.c.s.CSIStreamFactory:196 - WindowedTable:window:ng28start ==&gt; 1496241416653 end ==&gt; 1496241416653 windowedValue:[{&#34; eventID&#34;:&#34; 1&#34;,&#34; groupID&#34;:&#34; ng28&#34;,&#34; eventTime&#34;: &#34; 2017-05-31T14:36:56.653Z&#34;}]
活动 - 2抵达:
2017-06-01 09:57:27,158 INFO StreamThread-1 csccsEventTimeExtractor:49 - =======读取groupID:ng28 eventID:2 eventTime:2017-05-31T14:36:56.653Z
[KSTREAM-SOURCE-0000000000]:ng28,{&#34; eventID&#34;:&#34; 2&#34;,&#34; groupID&#34;:&#34; ng28&#34;, &#34; eventTime&#34;:&#34; 2017-05-31T14:36:56.653Z&#34;}
2017-06-01 09:57:27,158 INFO StreamThread-1 csccsCSIStreamFactory:164 - ------- raw stream:key:ng28 value:{&#34; eventID&#34;:&#34 ; 2&#34;,&#34; groupID&#34;:&#34; ng28&#34;,&#34; eventTime&#34;:&#34; 2017-05-31T14:36:56.653Z&#34; }
[KSTREAM-AGGREGATE-0000000003]:[ng28 @ 1496241416653],([{&#34; eventID&#34;:&#34; 1&#34;,&#34; groupID&#34;:&#34 ; ng28&#34;,&#34; eventTime&#34;:&#34; 2017-05-31T14:36:56.653Z&#34;},{&#34; eventID&#34;:&#34; 2&# 34;,&#34; groupID&#34;:&#34; ng28&#34;,&#34; eventTime&#34;:&#34; 2017-05-31T14:36:56.653Z&#34;}]&lt ; -null)
2017-06-01 09:57:27,160 INFO StreamThread-1 c.s.c.c.s.CSIStreamFactory:196 - WindowedTable:window:ng28start ==&gt; 1496241416653 end ==&gt; 1496241416653 windowedValue:[{&#34; eventID&#34;:&#34; 1&#34;,&#34; groupID&#34;:&#34; ng28&#34;,&#34; eventTime&#34;: &#34; 2017-05-31T14:36:56.653Z&#34;},{&#34; eventID&#34;:&#34; 2&#34;,&#34; groupID&#34;:&#34 ; ng28&#34;,&#34; eventTime&#34;:&#34; 2017-05-31T14:36:56.653Z&#34;}]
活动 - 3抵达:
2017-06-01 09:57:31,481 INFO StreamThread-1 c.s.c.c.s.EventTimeExtractor:49 - =======读取groupID:ng28 eventID:3 eventTime:2017-05-31T14:36:56.653Z
[KSTREAM-SOURCE-0000000000]:ng28,{&#34; eventID&#34;:&#34; 3&#34;,&#34; groupID&#34;:&#34; ng28&#34;,&#34 ; eventTime&#34;:&#34; 2017-05-31T14:36:56.653Z&#34;}
2017-06-01 09:57:31,482 INFO StreamThread-1 csccsCSIStreamFactory:164 - ------- raw stream:key:ng28 value:{&#34; eventID&#34;:&#34; 3&# 34;,&#34; groupID&#34;:&#34; ng28&#34;,&#34; eventTime&#34;:&#34; 2017-05-31T14:36:56.653Z&#34;}
[KSTREAM-AGGREGATE-0000000003]:[ng28 @ 1496241416653],([{&#34; eventID&#34;:&#34; 1&#34;,&#34; clientEventID&#34;:&#34 ; 123&#34;,&#34; groupID&#34;:&#34; ng28&#34;,&#34; eventTime&#34;:&#34; 2017-05-31T14:36:56.653Z&#34; },{&#34; eventID&#34;:&#34; 2&#34;,&#34; groupID&#34;:&#34; ng28&#34;,&#34; eventTime&#34;:&# 34; 2017-05-31T14:36:56.653Z&#34;},{&#34; eventID&#34;:&#34; 3&#34;,&#34; groupID&#34;:&#34; ng28& #34;,&#34; eventTime&#34;:&#34; 2017-05-31T14:36:56.653Z&#34;}]&lt; -null)
2017-06-01 09:57:31,484 INFO StreamThread-1 c.s.c.c.s.CSIStreamFactory:196 - WindowedTable:window:ng28start ==&gt; 1496241416653 end ==&gt; 1496241416653 windowedValue:[{&#34; eventID&#34;:&#34; 1&#34;,&#34; clientEventID&#34;:&#34; 123&#34;,&#34; groupID&#34;: &#34; ng28&#34;,&#34; eventTime&#34;:&#34; 2017-05-31T14:36:56.653Z&#34;},{&#34; eventID&#34;:&#34 ; 2&#34;,&#34; groupID&#34;:&#34; ng28&#34;,&#34; eventTime&#34;:&#34; 2017-05-31T14:36:56.653Z&#34; },{&#34; eventID&#34;:&#34; 3&#34;,&#34; groupID&#34;:&#34; ng28&#34;,&#34; eventTime&#34;:&# 34; 2017-05-31T14:36:56.653Z&#34;}]
记录失败分组事件:GroupID:ng30 GroupID:ng30;所有部分的事件发生时间与ng28组相似。
活动 - 1抵达:
2017-06-01 10:00:03,004 INFO StreamThread-1 c.s.c.c.s.EventTimeExtractor:49 - =======读取groupID:ng30 eventID:1 eventTime:2017-05-31T14:36:56.653Z
[KSTREAM-SOURCE-0000000000]:ng30,{&#34; eventID&#34;:&#34; 1&#34;,&#34; groupID&#34;:&#34; ng30&#34;, &#34; eventTime&#34;:&#34; 2017-05-31T14:36:56.653Z&#34;}
[KSTREAM-AGGREGATE-0000000003]:[ng30 @ 1496241416653],([{&#34; eventID&#34;:&#34; 1&#34;,&#34; groupID&#34;:&#34 ; ng30&#34;,&#34; eventTime&#34;:&#34; 2017-05-31T14:36:56.653Z&#34;,&#34; className&#34;:&#34; com.syniverse。 cds.domain.Event&#34;}]&LT; -null)
2017-06-01 10:00:03,007 INFO StreamThread-1 c.s.c.c.s.CSIStreamFactory:196 - WindowedTable:window:ng30start ==&gt; 1496241416653 end ==&gt; 1496241416653 windowedValue:[{&#34; eventID&#34;:&#34; 1&#34;,&#34; groupID&#34;:&#34; ng30&#34;,&#34; eventTime&#34;: &#34; 2017-05-31T14:36:56.653Z&#34;}]
活动 - 2抵达:
2017-06-01 10:00:09,225 INFO StreamThread-1 csccsEventTimeExtractor:49 - =======读取groupID:ng30 eventID:2 eventTime:2017-05-31T14:36:56.653Z
[KSTREAM-SOURCE-0000000000]:ng30,{&#34; eventID&#34;:&#34; 2&#34;,&#34; groupID&#34;:&#34; ng30&#34;, &#34; eventTime&#34;:&#34; 2017-05-31T14:36:56.653Z&#34;}
[KSTREAM-AGGREGATE-0000000003]:[ng30 @ 1496241416653],([{&#34; eventID&#34;:&#34; 2&#34;,&#34; groupID&#34;:&#34 ; ng30&#34;,&#34; eventTime&#34;:&#34; 2017-05-31T14:36:56.653Z&#34;}]&lt; -null)
2017-06-01 10:00:09,227 INFO StreamThread-1 c.s.c.c.s.CSIStreamFactory:196 - WindowedTable:window:ng30start ==&gt; 1496241416653 end ==&gt; 1496241416653 windowedValue:[{&#34; eventID&#34;:&#34; 2&#34;,&#34; groupID&#34;:&#34; ng30&#34;,&#34; eventTime&#34;: &#34; 2017-05-31T14:36:56.653Z&#34;}]
活动 - 3抵达:
2017-06-01 10:00:14,546 INFO StreamThread-1 csccsEventTimeExtractor:49 - =======读取groupID:ng30 eventID:3 eventTime:2017-05-31T14:36:56.653Z
[KSTREAM-SOURCE-0000000000]:ng30,{&#34; eventID&#34;:&#34; 3&#34;,&#34; clientEventID&#34;:&#34; 123&#34;, &#34; groupID&#34;:&#34; ng30&#34;,&#34; eventTime&#34;:&#34; 2017-05-31T14:36:56.653Z&#34;}
[KSTREAM-AGGREGATE-0000000003]:[ng30 @ 1496241416653],([{&#34; eventID&#34;:&#34; 3&#34;,&#34; groupID&#34;:&#34 ; ng30&#34;,&#34; eventTime&#34;:&#34; 2017-05-31T14:36:56.653Z&#34;}]&lt; -null)
2017-06-01 10:00:14,547 INFO StreamThread-1 c.s.c.c.s.CSIStreamFactory:196 - WindowedTable:window:ng30start ==&gt; 1496241416653 end ==&gt; 1496241416653 windowedValue:[{&#34; eventID&#34;:&#34; 3&#34;,&#34; groupID&#34;:&#34; ng30&#34;,&#34; eventTime&#34;: &#34; 2017-05-31T14:36:56.653Z&#34;}]
第二次和第三次事件到来后,之前的事件不会出现在聚合步骤中。
我们尝试过不同的组合
1. cache.max.bytes.buffering:设置为0以及10kb
2. commit.interval.ms:默认值以及范围从5ms到2秒的值。
这是否与错误或缺少配置有关?感谢您的帮助。