Kafka流:使用时间窗口汇总结果?

时间:2018-06-04 10:36:17

标签: java apache-kafka

我有以下kafka流应用程序,我需要使用自定义键聚合数据。关键是正在改变,但为了简单起见,我已经开始将键更改为一个字段(SampleMessage中的textId)。在组I之后需要得到总和(金额) - (金额是SampleMessage类中的双字段)。这就是我想到的。

StreamsBuilder builder = new StreamsBuilder();

builder = builder.addStateStore(Stores.keyValueStoreBuilder(
            Stores.inMemoryKeyValueStore("myStore"),
            Serdes.String(),
            Serdes.Long()).withLoggingDisabled());


KTable<String, SampleMessage> sampleMsgKtable = builder.table(TOPIC_NAME,
            Consumed.with(Serdes.String(), sampleMsgSerde));

KGroupedTable<String, SampleMessage> groupByAggregation = sampleMsgKtable.groupBy((key, value) -> {
       String groupBy = getGroupBy(/**Params **/); // key is now textId
       return KeyValue.pair(groupBy, value);
    }, Serialized.with(Serdes.String(), sampleMsgSerde));

KTable<String, SampleMessage> reduce = groupByAggregation.reduce(
            (current, newValue) -> {

                double currentAmount = current.getAmount();
                double newAmount = newValue.getAmount();
                double total = currentAmount + newAmount;
                current.setAmount(total);

                return current;
            },
            (val, agg) -> {

                double valAmount = val.getAmount();
                double aggAmount = agg.getAmount();
                double diff = aggAmount - valAmount;
                agg.setAmount(diff);

                return agg;
            });

 KTable<String, String> finalData = myTransformer.transformToString(reduce);

 finalData.toStream().to("output");

我使用以下消息测试上面的代码(使用kafka-streams-test-utils-1.1.0)。 5消息如下:

1. textId = x , amount = 45
2. textId = x , amount = 45
3. textId = x , amount = 45
4. textId = x , amount = 45
5. textId = y , amount = 45

我得到了以下

1. textId = x , amount = 45
2. textId = x , amount = 90
3. textId = x , amount = 135
4. textId = x , amount = 180
5. textId = y , amount = 45

现在我想基于时间窗口进行聚合(例如,以5分钟的时间间隔聚合)。如何用KTables做到这一点?

0 个答案:

没有答案