我有一个Kafka stream应用程序,该应用程序在传入状态下运行,并且需要在写入下一个主题之前存储该状态。只有在本地存储中更新状态后,才应执行写操作。
这样的事情。
stream.map(this::getAndUpdateState)
.map(this::processStateAndEvent)
.to("topicname");
这样我就可以在getAndUpdateState()
中做到
state = store.get(key); // or new if null
state = updateState(state, event); // update changes to state
store.put(key, state); // write back the state
return state;
如何在kafka商店上实现简单的get()和put()操作?我已经尝试使用KeyValueStore,但是它遇到了问题,因为我必须将其添加为源处理器和宿处理器。
或者,使用KTable或其他概念获取和放入kafka的方法也不错。
答案 0 :(得分:2)
感谢user152468和Matthias J. Sax的建议。
我能够使用transform()
方法在kafka流中进行状态处理。下面给出了基于原始Pipe示例的完整工作代码。
Pipe.java:
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.*;
import org.apache.kafka.streams.kstream.KStream;
import org.apache.kafka.streams.kstream.Transformer;
import org.apache.kafka.streams.kstream.TransformerSupplier;
import org.apache.kafka.streams.processor.ProcessorContext;
import org.apache.kafka.streams.state.KeyValueStore;
import org.apache.kafka.streams.state.StoreBuilder;
import org.apache.kafka.streams.state.Stores;
import java.util.Properties;
import java.util.concurrent.CountDownLatch;
public class Pipe{
public static void main(String[] args) throws Exception {
Properties properties = new Properties();
// setting Configs
properties.put(StreamsConfig.APPLICATION_ID_CONFIG, "streams-pipe");
properties.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
properties.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
properties.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());
// initializing a streambuilder for building topology.
final StreamsBuilder builder = new StreamsBuilder();
// creating a KStream that is continuously generating records from its source kafka topic "streams-plaintext-output"
KStream<String, String> source = builder.stream("streams-plaintext-input");
StoreBuilder<KeyValueStore<String, Long>> wordCountsStore = Stores.keyValueStoreBuilder(
Stores.persistentKeyValueStore("WordCountsStore"),
Serdes.String(),
Serdes.Long())
.withCachingEnabled();
builder.addStateStore(wordCountsStore);
source.map((k, v) -> KeyValue.pair("key", v))
.peek((k, s) -> System.out.printf("After keying: %s, value: %s\n", k, s))
.transform(new SampleTransformSupplier(wordCountsStore.name()), wordCountsStore.name())
.peek((k, s) -> System.out.printf("After transform: %s, value: %s\n", k, s));
// writing this source to another kafka topic "streams-pipe-output"
source.to("streams-pipe-output");
// generating the topology
final Topology topology = builder.build();
System.out.print(topology.describe());
// constructing a streams client with the properties and topology
final KafkaStreams streams = new KafkaStreams(topology, properties);
final CountDownLatch latch = new CountDownLatch(1);
// attaching shutdown handler
Runtime.getRuntime().addShutdownHook(new Thread("streams-shutdown-hook") {
@Override
public void run(){
streams.close();
latch.countDown();
}
});
try{
streams.start();
latch.await();
} catch (Throwable e){
System.exit(1);
}
System.exit(0);
}
private static class SampleTransformSupplier implements TransformerSupplier<String, String, KeyValue<String, String>> {
final private String stateStoreName;
public SampleTransformSupplier(String stateStoreName) {
this.stateStoreName = stateStoreName;
}
@Override
public Transformer<String, String, KeyValue<String, String>> get() {
return new Transformer<String, String, KeyValue<String, String>>() {
private KeyValueStore<String, Long> stateStore;
@SuppressWarnings("unchecked")
@Override
public void init(ProcessorContext processorContext) {
stateStore = (KeyValueStore<String, Long>) processorContext.getStateStore(stateStoreName);
}
@Override
public KeyValue<String, String> transform(String key, String value) {
Long countSoFar = stateStore.get(key);
if(countSoFar == null){
System.out.print("Initializing count so far. this message should be printed only once");
countSoFar = 0L;
}
countSoFar += value.length();
System.out.printf(" Key: %s, Value: %s, Count: %d\n\n", key, value, countSoFar);
stateStore.put(key, countSoFar);
return KeyValue.pair(key, value);
}
@Override
public void close() {
// No need to close as this is handled by kafka.
}
};
}
}
}
答案 1 :(得分:1)
听起来您要进行批处理。 Kafka Streams是一个流处理库,所有处理器并行/并行运行以构建数据管道。
我想您仍然可以使用具有附加状态的transform()
并且不向下游发出任何东西,而仅将数据放入存储中。然后,您可以安排一个壁钟时间标点来扫描整个商店并向下游发出商店中的所有数据。但是,总体而言,这似乎是一种反模式。
很难想到的是,何时“完全加载”状态-因为主题在定义上/概念上是无限的,所以加载状态“永远”不会完成。