kafka流窗口计数输出不可读

时间:2018-05-01 07:02:09

标签: apache-kafka apache-kafka-streams

我正在尝试使用字数计数窗口计数。它工作正常,但输出部分不可读。

代码:

    StringSerializer stringSerializer = new StringSerializer();
    StringDeserializer stringDeserializer = new StringDeserializer();
    WindowedSerializer<String> windowedSerializer = new WindowedSerializer<>(stringSerializer);
    WindowedDeserializer<String> windowedDeserializer = new WindowedDeserializer<>(stringDeserializer);
    Serde<Windowed<String>> windowedSerde = Serdes.serdeFrom(windowedSerializer, windowedDeserializer);

    TimeWindows window = TimeWindows.of(TimeUnit.MINUTES.toMillis(1)).advanceBy(TimeUnit.MINUTES.toMillis(1));

    KStream<String, String> textLines = builder.stream("streams-plaintext-input");
    KTable<Windowed<String>, Long> wordCounts = textLines
        .flatMapValues(textLine -> Arrays.asList(textLine.toLowerCase().split("\\W+")))
        .groupBy((key, word) -> word)
        .windowedBy(window)
        .count(Materialized.<String, Long, WindowStore<Bytes, byte[]>>as("counts-store"));
    wordCounts.toStream().to("streams-plaintext-output", Produced.with(windowedSerde, Serdes.Long()));

    KafkaStreams streams = new KafkaStreams(builder.build(), config);
    streams.start();

输出:

kafka c[??   1
yaya c[??    1
kafka c[??   2

我猜不可读的部分可能是窗口持续时间。 我该怎么办才能让它具有可读性?

修改

尝试使用windowedSerde打印输出:

    KStream<Windowed<String>, Long> output = builder.stream("streams-plaintext-output");
    output.print(windowedSerde, Serdes.Long());

它仍然不起作用。

1 个答案:

答案 0 :(得分:0)

从主题中读取时,您需要使用适用于序列化程序的反序列化程序,该序列化程序用于生成该主题。在这种情况下,您需要使用windowDeserializer,您正在构建如下:

WindowedDeserializer<String> windowedDeserializer = new WindowedDeserializer<>(stringDeserializer);