我在Spring应用程序中同时使用了 Kafka 和 Kafka流。虽然常规的 Kafka 通信(例如,向主题发送/从主题接收/接收)工作正常,但 Kafka流在应用程序启动后不久就停止使用(处理记录)。奇怪的是,当我重新启动应用程序时,记录的处理会恢复一小段时间,甚至将一些处理过的数据发送到主题,但是在 Kafka Streams 之后再次卡住。
我很确定自己缺少一些东西,可能是Kafka Streams的配置不正确,或者我的消费方式不正确。
经纪人配置: 我有3个代理的集群,通常我使用的是二进制文件附带的默认Kafka代理配置,除了我将每个主题的分区数提高到 3 。
仅更改了我的配置(也由于建议的群集大小):
num.partitions=3
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=3
版本和操作系统:
我正在使用 Kafka Streams v2.1.0 , Kafka客户v2.1.0 , Kafka经纪人v2.1.0 和 Spring Kafka 2.2。 3.发布。
在 Debian GNU / Linux 9.8(拉伸)上运行的经纪人和消费者。
Kafka Streams Java配置:
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, servers);
props.put(StreamsConfig.APPLICATION_ID_CONFIG, STREAMS_ID);
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
props.put(StreamsConfig.DEFAULT_TIMESTAMP_EXTRACTOR_CLASS_CONFIG, WallclockTimestampExtractor.class.getName());
props.put(StreamsConfig.NUM_STREAM_THREADS_CONFIG, 15);
props.put(StreamsConfig.DEFAULT_PRODUCTION_EXCEPTION_HANDLER_CLASS_CONFIG, StreamExceptionHandler.class);
props.put(StreamsConfig.DEFAULT_DESERIALIZATION_EXCEPTION_HANDLER_CLASS_CONFIG, LogAndContinueExceptionHandler.class);
我使用以下逻辑为最佳并发设置了15个流线程:
#num_of_partitions * #num_of_stream_topics
Kafka Streams消费者的行为示例:
KStream<String, ActivityLog> kStream = kStreamBuilder.stream(ServerConstants.KAFKA.ACTIVITY_LOGS_DESTINATION_TOPIC, Consumed.with(Serdes.String(), getActivityLogSerde()));
TriggerSensitivities triggerSensitivities = PredefinedTriggerSensitivities.SOME_TRIGGER;
kStream
.filter((id, activityLog) ->
isValidRecord(id, activityLog) &&
SUPPORTED_EVENT_TYPES.contains(activityLog.getType())
&& ruleService.hasRuleForActivity(id, activityLog, ThreatTrigger.TYPE.SOME_TRIGGER))
.selectKey((id, activityLog) -> new SelectedKey(id, activityLog.getEmail(),
ThreatTrigger.TYPE.SOME_TRIGGER, activityLog.getName()).toString())
.groupByKey()
.windowedBy(TimeWindows.of(Duration.of(triggerSensitivities.getHighestTimePeriod(), ChronoUnit.MILLIS)))
.aggregate(ArrayList<ActivityLog>::new,
(selectedKey, activityLog, activityLogs) -> {
activityLogs.add(activityLog);
return activityLogs;
},
Materialized.with(Serdes.String(),
Serdes.serdeFrom(new JsonPOJOSerializer<>(), new JsonPOJODeserializer<>(ArrayList.class, ActivityLog.class))))
.toStream()
.selectKey((windowedKey, activityLogs) -> windowedKey.key())
.mapValues((selectedKey, activityLogs) ->
ruleService.getMatchedTriggerActivities(triggerSensitivities,
ThreatTrigger.TYPE.SOME_TRIGGER,
selectedKey,
activityLogs))
.to(ServerConstants.KAFKA.DETECTION_EVENTS_TOPIC);
我在这里想要实现的目标是:
我接收到不同的活动日志,通过各种条件对其进行过滤,然后在一定时间内根据特定键对它们进行汇总,如果我汇总了足够多的日志和一些额外的逻辑,则会生成一个事件。它首先按预期工作,然后挂起。
代理日志:
一堆 INFO 日志,对我来说几乎一样
[2019-02-26 22:25:13,502] INFO [Log partition=coronet-streams-KSTREAM-AGGREGATE-STATE-STORE-0000000200-repartition-1, dir=/tmp/kafka-logs] Incrementing log start offset to 3467 (kafka.log.Log)
[2019-02-26 22:25:55,241] INFO [Log partition=coronet-streams-KSTREAM-AGGREGATE-STATE-STORE-0000000200-repartition-1, dir=/tmp/kafka-logs] Incrementing log start offset to 3470 (kafka.log.Log)
[2019-02-26 22:26:31,133] INFO [Log partition=coronet-streams-KSTREAM-AGGREGATE-STATE-STORE-0000000200-repartition-1, dir=/tmp/kafka-logs] Incrementing log start offset to 3471 (kafka.log.Log)
[2019-02-26 22:27:07,845] INFO [ProducerStateManager partition=coronet-streams-KSTREAM-AGGREGATE-STATE-STORE-0000000292-repartition-1] Writing producer snapshot at offset 2127 (kafka.log.ProducerStateManager)
[2019-02-26 22:27:07,845] INFO [Log partition=coronet-streams-KSTREAM-AGGREGATE-STATE-STORE-0000000292-repartition-1, dir=/tmp/kafka-logs] Rolled new log segment at offset 2127 in 1 ms. (kafka.log.Log)
[2019-02-26 22:34:32,835] INFO [GroupMetadataManager brokerId=0] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
消费者日志:
有时会打印以下日志,但后来我在多个来源上发现此 WARN 不会影响处理。
2019-02-26 22:44:19.291 WARN 7350 --- [coronet-streams-6553b7a0-b6fb-4e07-ad16-c040374e201e-StreamThread-4] o.a.k.s.p.i.ProcessorStateManager : task [0
_0] Failed to write offset checkpoint file to /tmp/kafka-streams/coronet-streams/0_0/.checkpoint: {}
java.io.FileNotFoundException: /tmp/kafka-streams/coronet-streams/0_0/.checkpoint.tmp (No such file or directory)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at java.io.FileOutputStream.<init>(FileOutputStream.java:162)
at org.apache.kafka.streams.state.internals.OffsetCheckpoint.write(OffsetCheckpoint.java:79)
at org.apache.kafka.streams.processor.internals.ProcessorStateManager.checkpoint(ProcessorStateManager.java:293)
at org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:446)
at org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:431)
at org.apache.kafka.streams.processor.internals.AssignedTasks.commit(AssignedTasks.java:346)
at org.apache.kafka.streams.processor.internals.TaskManager.commitAll(TaskManager.java:405)
at org.apache.kafka.streams.processor.internals.StreamThread.maybeCommit(StreamThread.java:1029)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:883)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:777)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:747)
除了我请参阅之外,使用应用程序记录器通过流线程在此处和那里打印了一些日志,因此似乎流有时会工作(有一些长时间延迟),但大部分卡住了一些为什么!
如果有人指出可能存在的问题,这将对很多人有帮助!
谢谢!