Kafka Streams不会使用聚合值重新启动

时间:2018-06-06 23:31:41

标签: java apache-kafka apache-kafka-streams

我在流上聚合值如下:

private KTable<String, StringAggregator> aggregate(KStream<String, String> inputStream) {
    return inputStream
            .groupByKey(Serialized.with(Serdes.String(), Serdes.String()))
            .aggregate(
                    StringAggregator::new,
                    (k, v, a) -> {
                        a.add(v);
                        return a;
                    }, Materialized.<String, StringAggregator>as(Stores.persistentKeyValueStore("STATE_STORE"))
                            .withKeySerde(Serdes.String())
                            .withValueSerde(getValueSerde(StringAggregator.class)));
}

通常情况下,这非常有效。但是,重新启动应用程序时,密钥的聚合值将丢失。此外,还有可能终止整个服务器,并且新的服务器(具有新版本的流应用程序)将联机。如何确保聚合的值仍然存在?

1 个答案:

答案 0 :(得分:0)

我最终创建了聚合逻辑,该聚合逻辑使用已保存在kafka主题上的聚合结果。这是逻辑:

private KStream<String, StringAggregator> getAggregator(String topicName, 
                                                        KStream<String, String> input,
                                                        KTable<String, StringAggregator> aggregator) {

    return input
            .leftJoin(aggregator, (inputMessage, aggregatorMessage) -> { 
                if (aggregatorMessage == null) { 
                    aggregatorMessage = new StringAggregator(); 
                }
                aggregatorMessage.add(inputMessage);
                return aggregatorMessage; 
            }).peek((k, v) -> logger.info("Aggregated a join input for {}: {}, {} aggregated.", topicName, k, v.size()));
}

这是实际构建流的逻辑。

String topicName = "input";
KStream<String, String> input = streamsBuilder.stream(topicName);
KTable<String, StringAggregator> aggregator = streamsBuilder.table("aggregate");
getAggregator(topicName, input, aggregator).to("aggregate");