这是kafka-streaming的初学者问题。
您将如何使用java kafka-streaming库收集消息对并将它们写入新的输出主题?
我正在考虑这样的事情:
private void accumulateTwo(KStream<String, String> messages) {
Optional<String> accumulator = Optional.empty();
messages.mapValues(value -> {
if (accumulator.isPresent()) {
String tmp = accumulator.get();
accumulator = Optional.empty();
return Optional.of(new Tuple<>(tmp, value));
}
else {
accumulator = Optional.of(value);
return Optional.empty();
}
}).filter((key, value) -> value.isPresent()).to("pairs");
但是这将不起作用,因为Java Lambda表达式中的变量必须是最终的。
有什么想法吗?
答案 0 :(得分:2)
如评论中所建议,还需要三个附加步骤:
Transformer
必须在状态存储区中明确存储其状态。它将从ProcessorContext
获取对状态存储的引用,并通过init
方法传递该状态存储。 StreamsBuilder
transform
方法中传递。在此示例中,存储我们所看到的最后一条消息就足够了。为此,我们使用的是KeyValueStore
,每个时间点的条目都为零或一。
public class PairTransformerSupplier<K,V> implements TransformerSupplier<K, V, KeyValue<K, Pair<V,V>>> {
private String storeName;
public PairTransformerSupplier(String storeName) {
this.storeName = storeName;
}
@Override
public Transformer<K, V, KeyValue<K, Pair<V, V>>> get() {
return new PairTransformer<>(storeName);
}
}
public class PairTransformer<K,V> implements Transformer<K, V, KeyValue<K, Pair<V, V>>> {
private ProcessorContext context;
private String storeName;
private KeyValueStore<Integer, V> stateStore;
public PairTransformer(String storeName) {
this.storeName = storeName;
}
@Override
public void init(ProcessorContext context) {
this.context = context;
stateStore = (KeyValueStore<Integer, V>) context.getStateStore(storeName);
}
@Override
public KeyValue<K, Pair<V, V>> transform(K key, V value) {
// 1. Update the store to remember the last message seen.
if (stateStore.get(1) == null) {
stateStore.put(1, value); return null;
}
KeyValue<K, Pair<V,V>> result = KeyValue.pair(key, new Pair<>(stateStore.get(1), value));
stateStore.put(1, null);
return result;
}
@Override
public void close() { }
}
public KStream<String, String> sampleStream(StreamsBuilder builder) {
KStream<String, String> messages = builder.stream(inputTopic, Consumed.with(Serdes.String(), Serdes.String()));
// 2. Create the state store and register it with the streams builder.
KeyValueBytesStoreSupplier store = Stores.persistentKeyValueStore(stateStoreName);
StoreBuilder storeBuilder = new KeyValueStoreBuilder<>(
store,
new Serdes.IntegerSerde(),
new Serdes.StringSerde(),
Time.SYSTEM
);
builder.addStateStore(storeBuilder);
transformToPairs(messages);
return messages;
}
private void transformToPairs(KStream<String, String> messages) {
// 3. reference the name of the state store when calling transform(...)
KStream<String, Pair<String, String>> pairs = messages.transform(
new PairTransformerSupplier<>(),
stateStoreName
);
KStream<String, Pair<String, String>> filtered = pairs.filter((key, value) -> value != null);
KStream<String, String> serialized = filtered.mapValues(Pair::toString);
serialized.to(outputTopic);
}
可以使用控制台使用者查看状态存储的更改:
./bin/kafka-console-consumer --topic <changelog-topic-name> --bootstrap-server localhost:9092
此处是完整的源代码:https://github.com/1123/spring-kafka-stream-with-state-store
org.apache.kafka.streams.kstream.ValueMapper
接口的JavaDoc声明它用于无状态逐记录转换,而org.apache.kafka.streams.kstream.Transformer
接口则是
用于将输入记录有状态地映射到零个,一个或多个新的输出记录。
因此,我猜想Transformer
接口是收集消息对的适当选择。这仅在流应用程序失败和重新启动的情况下才有意义,以便它们可以从Kafka恢复状态。
因此,这是基于org.apache.kafka.streams.kstream.Transformer
接口的另一种解决方案:
class PairTransformerSupplier<K,V> implements TransformerSupplier<K, V, KeyValue<K, Pair<V,V>>> {
@Override
public Transformer<K, V, KeyValue<K, Pair<V, V>>> get() {
return new PairTransformer<>();
}
}
public class PairTransformer<K,V> implements Transformer<K, V, KeyValue<K, Pair<V, V>>> {
private V left;
@Override
public void init(ProcessorContext context) {
left = null;
}
@Override
public KeyValue<K, Pair<V, V>> transform(K key, V value) {
if (left == null) { left = value; return null; }
KeyValue<K, Pair<V,V>> result = KeyValue.pair(key, new Pair<>(left, value));
left = null;
return result;
}
@Override
public KeyValue<K, Pair<V, V>> punctuate(long timestamp) {
return null;
}
public void close() { }
}
然后使用PairTransformerSupplier,如下所示:
private void accumulateTwo(KStream<String, String> messages) {
messages.transform(new PairTransformerSupplier<>())
.filter((key, value) -> value != null)
.mapValues(Pair::toString)
.to("pairs");
}
在具有单个分区的主题上的单个过程中尝试两种解决方案都会产生完全相同的结果。我没有尝试使用具有多个分区和多个流使用者的主题。
答案 1 :(得分:1)
您应该能够编写一个累加器类
class Accumulator implements ValueMapper<String, Optional<Tuple<String>>> {
private String key;
public Optional<Tuple<String>> get(String item) {
if (key == null) {
key = item;
return Optional.empty();
}
Optional<Tuple<String>> result = Optional.of(new Tuple<>(key, item));
key = null;
return result;
}
}
然后处理
messages.mapValues(new Accumulator())
.filter(Optional::isPresent) // I don't think your filter is correct
.to("pairs");