.kafka流会话窗口:聚合间歇性失败

时间:2017-06-02 19:38:17

标签: session apache-kafka aggregation apache-kafka-streams

我们正在使用Kafka流(0.10.2.0)来汇总相关事件。我们正在使用SessionWindows来聚合事件。聚合似乎不会在50%的时间内发生。 这是场景:

请求-1:所有事件都已成功聚合

  • GroupID:abc

  • 活动: E1; EVENTTIME = 2017-05-31T14:36:56.653Z E2; EVENTTIME = 2017-05-31T14:36:56.653Z E3; EVENTTIME = 2017-05-31T14:36:56.653Z

请求-2:没有聚合事件

  • GroupID:efg

  • 活动: E1; EVENTTIME = 2017-05-31T14:36:56.653Z E2; EVENTTIME = 2017-05-31T14:36:56.653Z E3; EVENTTIME = 2017-05-31T14:36:56.653Z

流式信息
TimestampExtractor:Stream使用事件时间提取器将事件分组到窗口。

窗口类型:会话窗口。
Windows不活动时间= 2分钟

流配置: CACHE_MAX_BYTES_BUFFERING_CONFIG = 0
TIMESTAMP_EXTRACTOR_CLASS_CONFIG = EventTimeExtractorImpl

代码段:

 KStreamBuilder builder = new KStreamBuilder();
 final KStream<String, GenericRecord> events = builder.stream(_appProperties.collationSourceTopic);
events.print();
KGroupedStream<String, GenericRecord> groupedStream = events.groupByKey(Serdes.String(), GenericCdsSerde.GenericCdsSerde());
    SessionWindows tmpSessionWindows = SessionWindows.with(TimeUnit.MINUTES.toMillis(Long.parseLong(_appProperties.collationWindowInMins)));

    KTable<Windowed<String>, List<GenericRecord>> sessionizedAggregatedStream = groupedStream.aggregate(
            ArrayList::new,
            (aggKey, newValue, aggValue) -> {
                try {
                    aggValue.add(newValue);
                } catch (Exception e) {
                    logger.error("failed aggr session windows", e);
                    return null;
                }
                return aggValue;
            },
            (aggKey, leftAggValue, rightAggValue) -> {
                try {
                    leftAggValue.addAll(rightAggValue);
                } catch (Exception e) {
                    logger.error("failed merging session windows", e);
                    return null;
                }
                return leftAggValue;
            },
            tmpSessionWindows, /* session window */
            GenericListCdsSerde.GenericListCdsSerde(), 
            "session-store";


    sessionizedAggregatedStream.print();
    sessionizedAggregatedStream.toStream().foreach((stringWindowed, s) ->
            logger.info("WindowedTable: window: " + stringWindowed.key()
                    + "start ==> " + ((SessionWindow)stringWindowed.window()).start()
                    + " end ==> " + stringWindowed.window().end()
                    + " windowedValue: " + s));

记录成功分组的事件: GroupID:ng28

活动 - 1次抵达:

2017-06-01 09:57:23,861 INFO StreamThread-1 csccsEventTimeExtractor:49 - =======读取groupID:ng28 eventID:1 eventTime:2017-05-31T14:36:56.653Z

[KSTREAM-SOURCE-0000000000]:ng28,{&#34; eventID&#34;:&#34; 1&#34;,&#34; groupID&#34;:&#34; ng28&#34;, &#34; eventTime&#34;:&#34; 2017-05-31T14:36:56.653Z&#34;}

[KSTREAM-AGGREGATE-0000000003]:[ng28 @ 1496241416653],([{&#34; eventID&#34;:&#34; 1&#34;,&#34; groupID&#34;:&#34 ; ng28&#34;,&#34; eventTime&#34;:&#34; 2017-05-31T14:36:56.653Z&#34;}]&lt; -null)

2017-06-01 09:57:23,864 INFO StreamThread-1 c.s.c.c.s.CSIStreamFactory:196 - WindowedTable:window:ng28start ==&gt; 1496241416653 end ==&gt; 1496241416653 windowedValue:[{&#34; eventID&#34;:&#34; 1&#34;,&#34; groupID&#34;:&#34; ng28&#34;,&#34; eventTime&#34;: &#34; 2017-05-31T14:36:56.653Z&#34;}]

活动 - 2抵达:

2017-06-01 09:57:27,158 INFO StreamThread-1 csccsEventTimeExtractor:49 - =======读取groupID:ng28 eventID:2 eventTime:2017-05-31T14:36:56.653Z

[KSTREAM-SOURCE-0000000000]:ng28,{&#34; eventID&#34;:&#34; 2&#34;,&#34; groupID&#34;:&#34; ng28&#34;, &#34; eventTime&#34;:&#34; 2017-05-31T14:36:56.653Z&#34;}

2017-06-01 09:57:27,158 INFO StreamThread-1 csccsCSIStreamFactory:164 - ------- raw stream:key:ng28 value:{&#34; eventID&#34;:&#34 ; 2&#34;,&#34; groupID&#34;:&#34; ng28&#34;,&#34; eventTime&#34;:&#34; 2017-05-31T14:36:56.653Z&#34; }

[KSTREAM-AGGREGATE-0000000003]:[ng28 @ 1496241416653],([{&#34; eventID&#34;:&#34; 1&#34;,&#34; groupID&#34;:&#34 ; ng28&#34;,&#34; eventTime&#34;:&#34; 2017-05-31T14:36:56.653Z&#34;},{&#34; eventID&#34;:&#34; 2&# 34;,&#34; groupID&#34;:&#34; ng28&#34;,&#34; eventTime&#34;:&#34; 2017-05-31T14:36:56.653Z&#34;}]&lt ; -null)

2017-06-01 09:57:27,160 INFO StreamThread-1 c.s.c.c.s.CSIStreamFactory:196 - WindowedTable:window:ng28start ==&gt; 1496241416653 end ==&gt; 1496241416653 windowedValue:[{&#34; eventID&#34;:&#34; 1&#34;,&#34; groupID&#34;:&#34; ng28&#34;,&#34; eventTime&#34;: &#34; 2017-05-31T14:36:56.653Z&#34;},{&#34; eventID&#34;:&#34; 2&#34;,&#34; groupID&#34;:&#34 ; ng28&#34;,&#34; eventTime&#34;:&#34; 2017-05-31T14:36:56.653Z&#34;}]

活动 - 3抵达:
2017-06-01 09:57:31,481 INFO StreamThread-1 c.s.c.c.s.EventTimeExtractor:49 - =======读取groupID:ng28 eventID:3 eventTime:2017-05-31T14:36:56.653Z [KSTREAM-SOURCE-0000000000]:ng28,{&#34; eventID&#34;:&#34; 3&#34;,&#34; groupID&#34;:&#34; ng28&#34;,&#34 ; eventTime&#34;:&#34; 2017-05-31T14:36:56.653Z&#34;} 2017-06-01 09:57:31,482 INFO StreamThread-1 csccsCSIStreamFactory:164 - ------- raw stream:key:ng28 value:{&#34; eventID&#34;:&#34; 3&# 34;,&#34; groupID&#34;:&#34; ng28&#34;,&#34; eventTime&#34;:&#34; 2017-05-31T14:36:56.653Z&#34;}

[KSTREAM-AGGREGATE-0000000003]:[ng28 @ 1496241416653],([{&#34; eventID&#34;:&#34; 1&#34;,&#34; clientEventID&#34;:&#34 ; 123&#34;,&#34; groupID&#34;:&#34; ng28&#34;,&#34; eventTime&#34;:&#34; 2017-05-31T14:36:56.653Z&#34; },{&#34; eventID&#34;:&#34; 2&#34;,&#34; groupID&#34;:&#34; ng28&#34;,&#34; eventTime&#34;:&# 34; 2017-05-31T14:36:56.653Z&#34;},{&#34; eventID&#34;:&#34; 3&#34;,&#34; groupID&#34;:&#34; ng28& #34;,&#34; eventTime&#34;:&#34; 2017-05-31T14:36:56.653Z&#34;}]&lt; -null)

2017-06-01 09:57:31,484 INFO StreamThread-1 c.s.c.c.s.CSIStreamFactory:196 - WindowedTable:window:ng28start ==&gt; 1496241416653 end ==&gt; 1496241416653 windowedValue:[{&#34; eventID&#34;:&#34; 1&#34;,&#34; clientEventID&#34;:&#34; 123&#34;,&#34; groupID&#34;: &#34; ng28&#34;,&#34; eventTime&#34;:&#34; 2017-05-31T14:36:56.653Z&#34;},{&#34; eventID&#34;:&#34 ; 2&#34;,&#34; groupID&#34;:&#34; ng28&#34;,&#34; eventTime&#34;:&#34; 2017-05-31T14:36:56.653Z&#34; },{&#34; eventID&#34;:&#34; 3&#34;,&#34; groupID&#34;:&#34; ng28&#34;,&#34; eventTime&#34;:&# 34; 2017-05-31T14:36:56.653Z&#34;}]

记录失败分组事件:GroupID:ng30 GroupID:ng30;所有部分的事件发生时间与ng28组相似。

活动 - 1抵达
2017-06-01 10:00:03,004 INFO StreamThread-1 c.s.c.c.s.EventTimeExtractor:49 - =======读取groupID:ng30 eventID:1 eventTime:2017-05-31T14:36:56.653Z

[KSTREAM-SOURCE-0000000000]:ng30,{&#34; eventID&#34;:&#34; 1&#34;,&#34; groupID&#34;:&#34; ng30&#34;, &#34; eventTime&#34;:&#34; 2017-05-31T14:36:56.653Z&#34;}

[KSTREAM-AGGREGATE-0000000003]:[ng30 @ 1496241416653],([{&#34; eventID&#34;:&#34; 1&#34;,&#34; groupID&#34;:&#34 ; ng30&#34;,&#34; eventTime&#34;:&#34; 2017-05-31T14:36:56.653Z&#34;,&#34; className&#34;:&#34; com.syniverse。 cds.domain.Event&#34;}]&LT; -null)

2017-06-01 10:00:03,007 INFO StreamThread-1 c.s.c.c.s.CSIStreamFactory:196 - WindowedTable:window:ng30start ==&gt; 1496241416653 end ==&gt; 1496241416653 windowedValue:[{&#34; eventID&#34;:&#34; 1&#34;,&#34; groupID&#34;:&#34; ng30&#34;,&#34; eventTime&#34;: &#34; 2017-05-31T14:36:56.653Z&#34;}]

活动 - 2抵达:

2017-06-01 10:00:09,225 INFO StreamThread-1 csccsEventTimeExtractor:49 - =======读取groupID:ng30 eventID:2 eventTime:2017-05-31T14:36:56.653Z

[KSTREAM-SOURCE-0000000000]:ng30,{&#34; eventID&#34;:&#34; 2&#34;,&#34; groupID&#34;:&#34; ng30&#34;, &#34; eventTime&#34;:&#34; 2017-05-31T14:36:56.653Z&#34;}

[KSTREAM-AGGREGATE-0000000003]:[ng30 @ 1496241416653],([{&#34; eventID&#34;:&#34; 2&#34;,&#34; groupID&#34;:&#34 ; ng30&#34;,&#34; eventTime&#34;:&#34; 2017-05-31T14:36:56.653Z&#34;}]&lt; -null)

2017-06-01 10:00:09,227 INFO StreamThread-1 c.s.c.c.s.CSIStreamFactory:196 - WindowedTable:window:ng30start ==&gt; 1496241416653 end ==&gt; 1496241416653 windowedValue:[{&#34; eventID&#34;:&#34; 2&#34;,&#34; groupID&#34;:&#34; ng30&#34;,&#34; eventTime&#34;: &#34; 2017-05-31T14:36:56.653Z&#34;}]

活动 - 3抵达:

2017-06-01 10:00:14,546 INFO StreamThread-1 csccsEventTimeExtractor:49 - =======读取groupID:ng30 eventID:3 eventTime:2017-05-31T14:36:56.653Z

[KSTREAM-SOURCE-0000000000]:ng30,{&#34; eventID&#34;:&#34; 3&#34;,&#34; clientEventID&#34;:&#34; 123&#34;, &#34; groupID&#34;:&#34; ng30&#34;,&#34; eventTime&#34;:&#34; 2017-05-31T14:36:56.653Z&#34;}

[KSTREAM-AGGREGATE-0000000003]:[ng30 @ 1496241416653],([{&#34; eventID&#34;:&#34; 3&#34;,&#34; groupID&#34;:&#34 ; ng30&#34;,&#34; eventTime&#34;:&#34; 2017-05-31T14:36:56.653Z&#34;}]&lt; -null)

2017-06-01 10:00:14,547 INFO StreamThread-1 c.s.c.c.s.CSIStreamFactory:196 - WindowedTable:window:ng30start ==&gt; 1496241416653 end ==&gt; 1496241416653 windowedValue:[{&#34; eventID&#34;:&#34; 3&#34;,&#34; groupID&#34;:&#34; ng30&#34;,&#34; eventTime&#34;: &#34; 2017-05-31T14:36:56.653Z&#34;}]

第二次和第三次事件到来后,之前的事件不会出现在聚合步骤中。

我们尝试过不同的组合 1. cache.max.bytes.buffering:设置为0以及10kb
2. commit.interval.ms:默认值以及范围从5ms到2秒的值。

这是否与错误或缺少配置有关?感谢您的帮助。

0 个答案:

没有答案