在kstreams应用程序中花费大量时间来转换消息

时间:2019-10-10 20:04:52

标签: java apache-kafka-streams

我有一个非常基本的用例kstreams应用程序,其中我将消息反跳几秒钟,然后使用transform将消息删除或存储在状态存储中。我也有一种标点符号方法,该方法每30秒触发一次以遍历商店并发出消息。

我发现的是,从我的应用程序收到消息到将其转换为转换函数所需的时间比我预期的要长得多(我假设转换函数在窗口过期之后很快就会发生) 。对于我的用例来说,这不完全是一个问题,但是我对将它转换为转换函数可能要花费这么长时间感到好奇。

    final StreamsBuilder builder = new StreamsBuilder();
    final StoreBuilder<KeyValueStore<String, Payload>> store = Stores.keyValueStoreBuilder(
            Stores.inMemoryKeyValueStore(keyValueStoreName),
            Serdes.String(),
            avroSerde
    );
    builder.addStateStore(store);

    final Consumed<String, Payload> consumed = Consumed.with(Serdes.String(), avroSerde)
            .withTimestampExtractor(new WallclockTimestampExtractor());
    final Produced<String, Payload> produced = Produced.with(Serdes.String(), avroSerde);
    final KStream<String, Payload> stream = builder.stream(inputTopic, consumed);
    final SessionWindows sessionWindows = SessionWindows
            .with(Duration.ofSeconds(2));
    final SessionWindowTransformerSupplier transformerSupplier =
            new SessionWindowTransformerSupplier(keyValueStoreName, scheduleTimeSeconds);
    final SessionBytesStoreSupplier sessionBytesStoreSupplier = Stores.persistentSessionStore(
            "debounce-window",
            Duration.ofSeconds(3));
    final Materialized<String, Payload, SessionStore<Bytes, byte[]>> materializedAs =
            Materialized.as(sessionBytesStoreSupplier);

    stream
            .selectKey((key, value) -> {
                logger.info("selecting key: " + key);
                return key;
            })
            .groupByKey()
            .windowedBy(sessionWindows)
            .reduce(payloadDebounceFunction::apply, materializedAs)
            .toStream()
            .transform(transformerSupplier, keyValueStoreName)
            .to(outputTopic, produced);

    return builder;

这是我的变换/标点方法:

@Override
public void init(ProcessorContext context) {
    this.processorContext = context;
    this.store = (KeyValueStore<String, Payload>) context.getStateStore(keyValueStoreName);
    context.schedule(ofSeconds(scheduleTime), WALL_CLOCK_TIME, timestamp -> punctuate());
}

@Override
public KeyValue<String, Payload> transform(Windowed<String> key, Payload value) {
    synchronized (this) {
        if(value != null) {
            BatchScanStatus status = extractStatus(value);
            boolean removeFromStoreStatus = BatchScanStatus.CANCELLED.equals(status)
                    || BatchScanStatus.FINALIZING.equals(status);

            if(removeFromStoreStatus) {
                logger.info("Deleting key from store: {}", key);
                store.delete(key.key());
            } else {
                logger.info("Adding key to store: {}", key);
                store.putIfAbsent(key.key(), value);
            }
            processorContext.commit();
        }
        return null;
    }
}

private void punctuate() {
    synchronized (this) {
        final KeyValueIterator<String, Payload> keyIter = store.all();
        while(keyIter.hasNext()) {
            final KeyValue<String, Payload> record = keyIter.next();
            logger.info("Forwarding key: {}", record.key);
            processorContext.forward(record.key, record.value);
        }

        keyIter.close();
    }
}

从selectKey函数到transform函数所需的时间太长,这使我感到困惑,因为在此运行过程中,它花费了大约24秒的时间

15:58:35.238 [scheduler-79112bd0-2310-482e-9aab-8bcaae746082-StreamThread-1] INFO  c.b.d.f.s.kstreams.Scheduler - selecting key: keykeykey
15:58:59.181 [scheduler-79112bd0-2310-482e-9aab-8bcaae746082-StreamThread-1] INFO  c.b.d.f.s.k.s.SessionTransformer - Adding key to store: [keykeykey@1570737515238/1570737515238]

为了使类似这样的事情花费kstream的时间,kstream的工作量是否比这里显示的要多?希望对这是配置/定时问题还是kstreams应用程序的正常行为有所启发。

编辑:我想我已经找到了哪里出了问题,这与commit.interval.ms的默认值有关。

直到对内部主题进行更改后,更改才会提交给内部主题,因此,直到这些更改到达内部主题后,我的转换函数才会触发。我把它缩短了一秒钟,立即看到了区别。

0 个答案:

没有答案