Apache Beam KafkaIO pipine异常处理

时间:2019-04-19 06:58:39

标签: java apache-kafka apache-beam

我有一条从KafkaIO读取的字句。使用Direct Runner:

    PipelineOptions options = PipelineOptionsFactory.as(PipelineOptions.class);
    Pipeline pipeline = Pipeline.create(options);
    Map<String, Object> props = new HashMap<>();
    props.put(ConsumerConfig.GROUP_ID_CONFIG, "tracker-statistics-group");
    props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);

    PTransform<PBegin, PCollection<KV<String, String>>> kafkaIo = KafkaIO.<String, String>read()
            .withBootstrapServers(bootstrapAddress)
            .withTopic(topic)
            .withKeyDeserializer(StringDeserializer.class)
            .withValueDeserializer(StringDeserializer.class)
            .updateConsumerProperties(props)
            .withReadCommitted()
            // offset consumed by the pipeline can be committed back.
            .commitOffsetsInFinalize()
            .withoutMetadata();

    pipeline
            .apply(kafkaIo)
            .apply(Values.create())
            .apply("ParseEvent", ParDo.of(new ParseEventFn()))
            .apply("test", ParDo.of(new PrintFn()));

    pipeline.run();

每次使用者收到一条消息时,它都会自动更改偏移量,即使使用者 ENABLE_AUTO_COMMIT_CONFIG 为假。而且当我的管道崩溃(运行时异常)时,我无法再次读取此消息,因为它已经提交。我认为方法.commitOffsetsInFinalize()保证只有在管道完成后才提交消息。我如何获得这种行为? KafkaIO中有蚂蚁选项吗?还是我应该编写自己的KafkaIO来提供此功能?

0 个答案:

没有答案