我有一条从KafkaIO读取的字句。使用Direct Runner:
PipelineOptions options = PipelineOptionsFactory.as(PipelineOptions.class);
Pipeline pipeline = Pipeline.create(options);
Map<String, Object> props = new HashMap<>();
props.put(ConsumerConfig.GROUP_ID_CONFIG, "tracker-statistics-group");
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
PTransform<PBegin, PCollection<KV<String, String>>> kafkaIo = KafkaIO.<String, String>read()
.withBootstrapServers(bootstrapAddress)
.withTopic(topic)
.withKeyDeserializer(StringDeserializer.class)
.withValueDeserializer(StringDeserializer.class)
.updateConsumerProperties(props)
.withReadCommitted()
// offset consumed by the pipeline can be committed back.
.commitOffsetsInFinalize()
.withoutMetadata();
pipeline
.apply(kafkaIo)
.apply(Values.create())
.apply("ParseEvent", ParDo.of(new ParseEventFn()))
.apply("test", ParDo.of(new PrintFn()));
pipeline.run();
每次使用者收到一条消息时,它都会自动更改偏移量,即使使用者 ENABLE_AUTO_COMMIT_CONFIG 为假。而且当我的管道崩溃(运行时异常)时,我无法再次读取此消息,因为它已经提交。我认为方法.commitOffsetsInFinalize()
保证只有在管道完成后才提交消息。我如何获得这种行为? KafkaIO中有蚂蚁选项吗?还是我应该编写自己的KafkaIO来提供此功能?