Question

我正在设置一个从某个主题使用的Kafka Streams应用程序（保留时间：14天，cleanup.policy：删除，分区：1）。我希望使用这些消息并将其输出到另一个主题（保留：-1，cleanup.policy：紧凑，分区：3）。

按输入主题上的键进行分组。所以：输入主题：

Key: A   Value: { SomeJson }
Key: A   Value: { Other Json}
Key: B   Value: { TestJson }

输出：

Key: A   Value: {[ { SomeJson }, { Other Json } ]}
Key: B   Value: {[ { TestJson } ]}

重要的是，输出主题上的内容绝不会丢失，因此很容易确认：所有副本和3x副本。压缩主题中的每个键将具有约100个json记录。每个密钥估计少于20kb。

我还希望输出主题充当状态主题，这样就不必创建另一个包含相同信息的主题。

有人知道该怎么做吗？我发现的大多数示例都与开窗有关：https://github.com/confluentinc/kafka-streams-examples/tree/5.3.1-post/src/main/java/io/confluent/examples/streams

当前代码：

val mapper = new ObjectMapper();                                                                             

builder.stream(properties.getInputTopic(), Consumed.with(Serdes.String(), Serdes.String()))                   
        .groupByKey()                                                                                         
        .aggregate(                                                                                          
                () -> new GroupedIdenthendelser(Collections.emptyList()),                                    
                (key, value, currentAggregate) -> {                                                          
                    val items = new ArrayList<>(currentAggregate.getIdenthendelser());                       
                    items.add(value);                                                                        

                    return new GroupedIdenthendelser(items);                                                 
                },                                                                                           
                Materialized.with(Serdes.String(), new JsonSerde<>(GroupedIdenthendelser.class, mapper)))     
        .toStream()                                                                                           
        .to(properties.getOutputTopic(), Produced.with(Serdes.String(), new JsonSerde<>(mapper)));

如果有人要提供其他提示，请务必告知，因为此数据是客户信息，因此，如果有一些配置我应该加以告知。或者，如果您知道那里的一些博客文章/示例，我们将不胜感激。

编辑：上面的代码示例似乎起作用，但是它创建了自己的状态主题，这是不需要的，因为输出主题将始终包含相同的状态。因为输入主题具有1个分区，并且与每个人相关，所以它只有一个应用程序在运行，因为它与固定大小的人（10,000万个参与者）有关，因此数据的大小不会增加到每人20kb以上要么。事件的每秒估计约为1 / s，因此负载也不大。

拓扑：

Sub-topology: 0
    Source: KSTREAM-SOURCE-0000000000 (topics: [input-topic])
      --> KSTREAM-AGGREGATE-0000000002
    Processor: KSTREAM-AGGREGATE-0000000002 (stores: [KSTREAM-AGGREGATE-STATE-STORE-0000000001])
      --> KTABLE-TOSTREAM-0000000003
      <-- KSTREAM-SOURCE-0000000000
    Processor: KTABLE-TOSTREAM-0000000003 (stores: [])
      --> KSTREAM-SINK-0000000004
      <-- KSTREAM-AGGREGATE-0000000002
    Sink: KSTREAM-SINK-0000000004 (topic: output-topic)
      <-- KTABLE-TOSTREAM-0000000003

Answer 1

看看您的示例数据集，我想您可能需要的是实时聚合。请以Confluent的this博客文章为起点。

聚集到具有无限保留的紧凑主题

1 个答案: