Kafka Streams 2.1.1类在刷新定时聚合以存储时进行强制转换

时间:2019-03-15 16:15:50

标签: apache-kafka apache-kafka-streams

我正在尝试使用kafka流执行窗口聚合,并且仅在关闭特定会话窗口之后才发出结果。为此,我使用了抑制功能。

问题是我没有找到一种方法来使这个简单的测试工作,因为当它尝试保持状态时,我得到了一个类强制转换异常,因为它试图将Windowed转换为String。 我尝试向聚合函数提供Materialized<Windowed<String>,Long,StateStore<>>,但它不进行类型检查,因为它希望第一种类型只是字符串。

我在这里想念什么?

kafka版本2.1.1

package test;

import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.KeyValue;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.Topology;
import org.apache.kafka.streams.TopologyTestDriver;
import org.apache.kafka.streams.kstream.KStream;
import org.apache.kafka.streams.kstream.SessionWindows;
import org.apache.kafka.streams.kstream.Suppressed;
import org.apache.kafka.streams.test.ConsumerRecordFactory;
import org.junit.Test;

import java.text.MessageFormat;
import java.time.Duration;
import java.util.Properties;

public class TestAggregation {

    @Test
    public void aggregationTest() {
        StreamsBuilder streamsBuilder = new StreamsBuilder();
        KStream<String, Long> input = streamsBuilder.stream("input");

        input
            .groupByKey()
            .windowedBy(SessionWindows.with(Duration.ofSeconds(30)))
            .aggregate(() -> Long.valueOf(0), (key, v1, v2) -> v1 + v2, (key, agg1, agg2) -> agg1 + agg2)
            .suppress(Suppressed.untilWindowCloses(Suppressed.BufferConfig.unbounded()))
            .toStream()
            .map((k, v) -> new KeyValue<>(k.key(), v))
            .to("output");

        Topology topology = streamsBuilder.build();

        Properties props = new Properties();
        props.put(StreamsConfig.APPLICATION_ID_CONFIG, "test");
        props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "dummy:1234");
        props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
        props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.Long().getClass().getName());

        TopologyTestDriver testDriver = new TopologyTestDriver(topology, props);

        ConsumerRecordFactory<String, Long> producer =
            new ConsumerRecordFactory<>("input", Serdes.String().serializer(), Serdes.Long().serializer());

        testDriver.pipeInput(producer.create("input", "key", 10L));

        ProducerRecord<String, Long> output = testDriver.readOutput("output", Serdes.String().deserializer(), Serdes.Long().deserializer());
        System.out.println(MessageFormat.format("output: k: {0}, v:{1}", output.key(), output.value()));
    }
}

这是我摆脱的堆栈跟踪

17:05:38.925 [main] DEBUG org.apache.kafka.streams.processor.internals.StreamTask - task [0_0] Committing
17:05:38.925 [main] DEBUG org.apache.kafka.streams.processor.internals.ProcessorStateManager - task [0_0] Flushing all stores registered in the state manager
17:05:38.929 [main] ERROR org.apache.kafka.streams.processor.internals.ProcessorStateManager - task [0_0] Failed to flush state store KSTREAM-AGGREGATE-STATE-STORE-0000000001: 
java.lang.ClassCastException: org.apache.kafka.streams.kstream.Windowed cannot be cast to java.lang.String
    at org.apache.kafka.common.serialization.StringSerializer.serialize(StringSerializer.java:28)
    at org.apache.kafka.streams.kstream.internals.suppress.KTableSuppressProcessor.buffer(KTableSuppressProcessor.java:86)
    at org.apache.kafka.streams.kstream.internals.suppress.KTableSuppressProcessor.process(KTableSuppressProcessor.java:78)
    at org.apache.kafka.streams.kstream.internals.suppress.KTableSuppressProcessor.process(KTableSuppressProcessor.java:37)
    at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:117)
    at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:146)
    at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:129)
    at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:93)
    at org.apache.kafka.streams.kstream.internals.ForwardingCacheFlushListener.apply(ForwardingCacheFlushListener.java:42)
    at org.apache.kafka.streams.state.internals.CachingSessionStore.putAndMaybeForward(CachingSessionStore.java:179)
    at org.apache.kafka.streams.state.internals.CachingSessionStore.access$000(CachingSessionStore.java:37)
    at org.apache.kafka.streams.state.internals.CachingSessionStore$1.apply(CachingSessionStore.java:86)
    at org.apache.kafka.streams.state.internals.NamedCache.flush(NamedCache.java:141)
    at org.apache.kafka.streams.state.internals.NamedCache.flush(NamedCache.java:99)
    at org.apache.kafka.streams.state.internals.ThreadCache.flush(ThreadCache.java:124)
    at org.apache.kafka.streams.state.internals.CachingSessionStore.flush(CachingSessionStore.java:198)
    at org.apache.kafka.streams.state.internals.MeteredSessionStore.flush(MeteredSessionStore.java:191)
    at org.apache.kafka.streams.processor.internals.ProcessorStateManager.flush(ProcessorStateManager.java:217)
    at org.apache.kafka.streams.processor.internals.AbstractTask.flushState(AbstractTask.java:204)
    at org.apache.kafka.streams.processor.internals.StreamTask.flushState(StreamTask.java:491)
    at org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:443)
    at org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:431)
    at org.apache.kafka.streams.TopologyTestDriver.pipeInput(TopologyTestDriver.java:405)
    at test.TestAggregation.aggregationTest(TestAggregation.java:49)

2 个答案:

答案 0 :(得分:2)

有两种方法可以解决该问题:

  1. 使用TimeWindowedKStream::aggregate(final Initializer<VR> initializer, final Aggregator<? super K, ? super V, VR> aggregator, final Materialized<K, VR, WindowStore<Bytes, byte[]>> materialized);

  2. 使用KStream::groupByKey(final Grouped<K, V> grouped)

在这种情况下,它将是:

广告1:

input
    .groupByKey()
    .windowedBy(SessionWindows.with(Duration.ofSeconds(30)))
    .aggregate(() -> Long.valueOf(0), (key, v1, v2) -> v1 + v2, (key, agg1, agg2) -> agg1 + agg2, Materialized.with(Serdes.String(), Serdes.Long()))
    .suppress(Suppressed.untilWindowCloses(Suppressed.BufferConfig.unbounded()))
    .toStream()
    .map((k, v) -> new KeyValue<>(k.key(), v))
    .to("output");

广告2:

input
    .groupByKey(Grouped.with(Serdes.String(), Serdes.Long())
    .windowedBy(SessionWindows.with(Duration.ofSeconds(30)))
    .aggregate(() -> Long.valueOf(0), (key, v1, v2) -> v1 + v2, (key, agg1, agg2) -> agg1 + agg2)
    .suppress(Suppressed.untilWindowCloses(Suppressed.BufferConfig.unbounded()))
    .toStream()
    .map((k, v) -> new KeyValue<>(k.key(), v))
    .to("output");

答案 1 :(得分:0)

要使其与TopologyTestDriver一起使用,您需要延长时钟时间,这似乎对抑制步骤没有影响。一种解决方法是允许您的测试使用如下设置覆盖Suppress配置:

Suppressed.untilTimeLimit(Duration.ZERO, BufferConfig.unbounded())