我有以下kafka流应用程序,我需要使用自定义键聚合数据。关键是正在改变,但为了简单起见,我已经开始将键更改为一个字段(SampleMessage中的textId)。在组I之后需要得到总和(金额) - (金额是SampleMessage类中的双字段)。这就是我想到的。
StreamsBuilder builder = new StreamsBuilder();
builder = builder.addStateStore(Stores.keyValueStoreBuilder(
Stores.inMemoryKeyValueStore("myStore"),
Serdes.String(),
Serdes.Long()).withLoggingDisabled());
KTable<String, SampleMessage> sampleMsgKtable = builder.table(TOPIC_NAME,
Consumed.with(Serdes.String(), sampleMsgSerde));
KGroupedTable<String, SampleMessage> groupByAggregation = sampleMsgKtable.groupBy((key, value) -> {
String groupBy = getGroupBy(/**Params **/); // key is now textId
return KeyValue.pair(groupBy, value);
}, Serialized.with(Serdes.String(), sampleMsgSerde));
KTable<String, SampleMessage> reduce = groupByAggregation.reduce(
(current, newValue) -> {
double currentAmount = current.getAmount();
double newAmount = newValue.getAmount();
double total = currentAmount + newAmount;
current.setAmount(total);
return current;
},
(val, agg) -> {
double valAmount = val.getAmount();
double aggAmount = agg.getAmount();
double diff = aggAmount - valAmount;
agg.setAmount(diff);
return agg;
});
KTable<String, String> finalData = myTransformer.transformToString(reduce);
finalData.toStream().to("output");
我使用以下消息测试上面的代码(使用kafka-streams-test-utils-1.1.0)。 5消息如下:
1. textId = x , amount = 45
2. textId = x , amount = 45
3. textId = x , amount = 45
4. textId = x , amount = 45
5. textId = y , amount = 45
我得到了以下
1. textId = x , amount = 45
2. textId = x , amount = 90
3. textId = x , amount = 135
4. textId = x , amount = 180
5. textId = y , amount = 45
现在我想基于时间窗口进行聚合(例如,以5分钟的时间间隔聚合)。如何用KTables做到这一点?