我在流上聚合值如下:
private KTable<String, StringAggregator> aggregate(KStream<String, String> inputStream) {
return inputStream
.groupByKey(Serialized.with(Serdes.String(), Serdes.String()))
.aggregate(
StringAggregator::new,
(k, v, a) -> {
a.add(v);
return a;
}, Materialized.<String, StringAggregator>as(Stores.persistentKeyValueStore("STATE_STORE"))
.withKeySerde(Serdes.String())
.withValueSerde(getValueSerde(StringAggregator.class)));
}
通常情况下,这非常有效。但是,重新启动应用程序时,密钥的聚合值将丢失。此外,还有可能终止整个服务器,并且新的服务器(具有新版本的流应用程序)将联机。如何确保聚合的值仍然存在?
答案 0 :(得分:0)
我最终创建了聚合逻辑,该聚合逻辑使用已保存在kafka主题上的聚合结果。这是逻辑:
private KStream<String, StringAggregator> getAggregator(String topicName,
KStream<String, String> input,
KTable<String, StringAggregator> aggregator) {
return input
.leftJoin(aggregator, (inputMessage, aggregatorMessage) -> {
if (aggregatorMessage == null) {
aggregatorMessage = new StringAggregator();
}
aggregatorMessage.add(inputMessage);
return aggregatorMessage;
}).peek((k, v) -> logger.info("Aggregated a join input for {}: {}, {} aggregated.", topicName, k, v.size()));
}
这是实际构建流的逻辑。
String topicName = "input";
KStream<String, String> input = streamsBuilder.stream(topicName);
KTable<String, StringAggregator> aggregator = streamsBuilder.table("aggregate");
getAggregator(topicName, input, aggregator).to("aggregate");