应用程序启动后不久,卡夫卡流就停止消耗

时间:2019-02-26 22:56:46

标签: apache-kafka kafka-consumer-api apache-kafka-streams kafka-producer-api spring-kafka

我在Spring应用程序中同时使用了 Kafka Kafka流。虽然常规的 Kafka 通信(例如,向主题发送/从主题接收/接收)工作正常,但 Kafka流在应用程序启动后不久就停止使用(处理记录)。奇怪的是,当我重新启动应用程序时,记录的处理会恢复一小段时间,甚至将一些处理过的数据发送到主题,但是在 Kafka Streams 之后再次卡住。
我很确定自己缺少一些东西,可能是Kafka Streams的配置不正确,或者我的消费方式不正确。

经纪人配置: 我有3个代理的集群,通常我使用的是二进制文件附带的默认Kafka代理配置,除了我将每个主题的分区数提高到 3

仅更改了我的配置(也由于建议的群集大小):

num.partitions=3
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=3

版本和操作系统:
我正在使用 Kafka Streams v2.1.0 Kafka客户v2.1.0 Kafka经纪人v2.1.0 Spring Kafka 2.2。 3.发布

Debian GNU / Linux 9.8(拉伸)上运行的经纪人和消费者。

Kafka Streams Java配置:

        props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, servers);
        props.put(StreamsConfig.APPLICATION_ID_CONFIG, STREAMS_ID);
        props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
        props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
        props.put(StreamsConfig.DEFAULT_TIMESTAMP_EXTRACTOR_CLASS_CONFIG, WallclockTimestampExtractor.class.getName());
        props.put(StreamsConfig.NUM_STREAM_THREADS_CONFIG, 15);
        props.put(StreamsConfig.DEFAULT_PRODUCTION_EXCEPTION_HANDLER_CLASS_CONFIG, StreamExceptionHandler.class);
        props.put(StreamsConfig.DEFAULT_DESERIALIZATION_EXCEPTION_HANDLER_CLASS_CONFIG, LogAndContinueExceptionHandler.class);

我使用以下逻辑为最佳并发设置了15个流线程:
#num_of_partitions * #num_of_stream_topics

Kafka Streams消费者的行为示例:

    KStream<String, ActivityLog> kStream = kStreamBuilder.stream(ServerConstants.KAFKA.ACTIVITY_LOGS_DESTINATION_TOPIC, Consumed.with(Serdes.String(), getActivityLogSerde()));

    TriggerSensitivities triggerSensitivities = PredefinedTriggerSensitivities.SOME_TRIGGER;

    kStream
            .filter((id, activityLog) ->
                    isValidRecord(id, activityLog) &&
                            SUPPORTED_EVENT_TYPES.contains(activityLog.getType())
                            && ruleService.hasRuleForActivity(id, activityLog, ThreatTrigger.TYPE.SOME_TRIGGER))

            .selectKey((id, activityLog) -> new SelectedKey(id, activityLog.getEmail(),
                    ThreatTrigger.TYPE.SOME_TRIGGER, activityLog.getName()).toString())
            .groupByKey()
            .windowedBy(TimeWindows.of(Duration.of(triggerSensitivities.getHighestTimePeriod(), ChronoUnit.MILLIS)))
            .aggregate(ArrayList<ActivityLog>::new,
                    (selectedKey, activityLog, activityLogs) -> {
                        activityLogs.add(activityLog);
                        return activityLogs;
                    },
                    Materialized.with(Serdes.String(),
                            Serdes.serdeFrom(new JsonPOJOSerializer<>(), new JsonPOJODeserializer<>(ArrayList.class, ActivityLog.class))))
            .toStream()
            .selectKey((windowedKey, activityLogs) -> windowedKey.key())
            .mapValues((selectedKey, activityLogs) ->
                    ruleService.getMatchedTriggerActivities(triggerSensitivities,
                            ThreatTrigger.TYPE.SOME_TRIGGER,
                            selectedKey,
                            activityLogs))
            .to(ServerConstants.KAFKA.DETECTION_EVENTS_TOPIC);

我在这里想要实现的目标是:
我接收到不同的活动日志,通过各种条件对其进行过滤,然后在一定时间内根据特定键对它们进行汇总,如果我汇总了足够多的日志和一些额外的逻辑,则会生成一个事件。它首先按预期工作,然后挂起。

代理日志:
一堆 INFO 日志,对我来说几乎一样

[2019-02-26 22:25:13,502] INFO [Log partition=coronet-streams-KSTREAM-AGGREGATE-STATE-STORE-0000000200-repartition-1, dir=/tmp/kafka-logs] Incrementing log start offset to 3467 (kafka.log.Log)
[2019-02-26 22:25:55,241] INFO [Log partition=coronet-streams-KSTREAM-AGGREGATE-STATE-STORE-0000000200-repartition-1, dir=/tmp/kafka-logs] Incrementing log start offset to 3470 (kafka.log.Log)
[2019-02-26 22:26:31,133] INFO [Log partition=coronet-streams-KSTREAM-AGGREGATE-STATE-STORE-0000000200-repartition-1, dir=/tmp/kafka-logs] Incrementing log start offset to 3471 (kafka.log.Log)
[2019-02-26 22:27:07,845] INFO [ProducerStateManager partition=coronet-streams-KSTREAM-AGGREGATE-STATE-STORE-0000000292-repartition-1] Writing producer snapshot at offset 2127 (kafka.log.ProducerStateManager)
[2019-02-26 22:27:07,845] INFO [Log partition=coronet-streams-KSTREAM-AGGREGATE-STATE-STORE-0000000292-repartition-1, dir=/tmp/kafka-logs] Rolled new log segment at offset 2127 in 1 ms. (kafka.log.Log)
[2019-02-26 22:34:32,835] INFO [GroupMetadataManager brokerId=0] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)

消费者日志:
有时会打印以下日志,但后来我在多个来源上发现此 WARN 不会影响处理。

2019-02-26 22:44:19.291  WARN 7350 --- [coronet-streams-6553b7a0-b6fb-4e07-ad16-c040374e201e-StreamThread-4] o.a.k.s.p.i.ProcessorStateManager        : task [0
_0] Failed to write offset checkpoint file to /tmp/kafka-streams/coronet-streams/0_0/.checkpoint: {}

java.io.FileNotFoundException: /tmp/kafka-streams/coronet-streams/0_0/.checkpoint.tmp (No such file or directory)
        at java.io.FileOutputStream.open0(Native Method)
        at java.io.FileOutputStream.open(FileOutputStream.java:270)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:162)
        at org.apache.kafka.streams.state.internals.OffsetCheckpoint.write(OffsetCheckpoint.java:79)
        at org.apache.kafka.streams.processor.internals.ProcessorStateManager.checkpoint(ProcessorStateManager.java:293)
        at org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:446)
        at org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:431)
        at org.apache.kafka.streams.processor.internals.AssignedTasks.commit(AssignedTasks.java:346)
        at org.apache.kafka.streams.processor.internals.TaskManager.commitAll(TaskManager.java:405)
        at org.apache.kafka.streams.processor.internals.StreamThread.maybeCommit(StreamThread.java:1029)
        at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:883)
        at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:777)
        at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:747)

除了我请参阅之外,使用应用程序记录器通过流线程在此处和那里打印了一些日志,因此似乎流有时会工作(有一些长时间延迟),但大部分卡住了一些为什么!

如果有人指出可能存在的问题,这将对很多人有帮助!
谢谢!

0 个答案:

没有答案