Kafka Streams的StreamsConfig.COMMIT_INTERVAL_MS_CONFIG的合理值是多少

时间:2019-02-15 12:46:06

标签: apache-kafka-streams

我一直在寻找一些有关Kafka Streams的示例,它们的配置值'StreamsConfig.COMMIT_INTERVAL_MS_CONFIG'的不同值使我有些困惑。

例如,在微服务示例中,

config.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 1); //commit as fast as possible

https://github.com/confluentinc/kafka-streams-examples/blob/5.1.0-post/src/main/java/io/confluent/examples/streams/microservices/util/MicroserviceUtils.java

另一个,

// Records should be flushed every 10 seconds. This is less than the 
default
// in order to keep this example interactive.
streamsConfiguration.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 10 * 
1000);

https://github.com/confluentinc/kafka-streams-examples/blob/5.1.0-post/src/main/java/io/confluent/examples/streams/WordCountLambdaExample.java

另一个,

// Set the commit interval to 500ms so that any changes are flushed 
frequently and the top five
// charts are updated with low latency.
streamsConfiguration.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 
 500);

https://github.com/confluentinc/kafka-streams-examples/blob/5.1.0-post/src/main/java/io/confluent/examples/streams/interactivequeries/kafkamusic/KafkaMusicExample.java

在示例中,间隔从1ms更改为10000ms,我真正感兴趣的是系统中始终负载很重的1ms,如果设置1ms提交间隔会很危险吗?

谢谢。.

1 个答案:

答案 0 :(得分:1)

好吧,这取决于您要提交记录的频率。它实际上是指内存中的Record Caching

https://kafka.apache.org/21/documentation/streams/developer-guide/memory-mgmt.html#record-caches-in-the-dsl

如果要查看每条记录作为输出,可以将其设置为最低编号。在某些情况下,您可能希望获取每个事件的输出,而在此情况下,编号最小是有意义的。但是在某些情况下,可以合并事件并产生较少的输出,可以将其设置为更大的数字。

还请注意,记录缓存受以下两个配置影响:

commit.interval.mscache.max.byte.buffering

缓存的含义是,每当最早的commit.interval.mscache.max.bytes.buffering(缓存压力)命中时,数据都会被刷新到状态存储并转发到下一个下游处理器节点。