卡夫卡抵消承诺不起作用?

时间:2017-10-03 11:37:55

标签: apache-spark apache-kafka spark-streaming

我使用Kafka直接流来读取Kafka的日志。

KafkaUtils.createDirectStream(streamingContext, LocationStrategies.PreferConsistent(), ConsumerStrategies.<String, String>Subscribe(topics, kafkaParams));

并且,在处理过程中,如果任何分区中的处理有任何问题,我会创建新的偏移量,以便日志显示为未处理,我可以重试(或者我的想法)

以下是该部分的代码:

 dStream.foreachRDD((JavaRDD<ConsumerRecord<String, String>> rdd) -> {
            OffsetRange[] offsets = ((HasOffsetRanges) rdd.rdd()).offsetRanges();

            LOG.info(Arrays.toString(offsets));

            if (!rdd.isEmpty()) {

                JavaRDD<OffsetRange> rddUploadResult = rdd.mapPartitions((FlatMapFunction<Iterator<ConsumerRecord<String, String>>, OffsetRange>) iterator -> {

                // Some processing...

                    Result result = worker.call();

                    OffsetRange partitionOffset = offsets[TaskContext.getPartitionId()];

                    if (result.isSuccessful()) {
                        LOG.info("Batch successful. Committing offsets.");
                        return Collections.singletonList(partitionOffset).iterator();
                    } else {
                        LOG.info("Batch unsuccessful. Creating new offsets.");

                        return Collections.singletonList(OffsetRange.create(partitionOffset.topicPartition(), partitionOffset.fromOffset(), partitionOffset.fromOffset())).iterator();
                    }
                });

                OffsetRange[] newOffsets = rddUploadResult.collect().toArray(new OffsetRange[0]);

                LOG.info("NEW OFFSETS: {}", Arrays.toString(newOffsets));

                // Update offsets
                ((CanCommitOffsets) dStream.inputDStream()).commitAsync(newOffsets);

            } else {
                LOG.debug("Nothing to do...");
            }
        });

但是当我运行它时,这是输出:

# Init state [OffsetRange(topic: 'raul-2', partition: 1, range: [4 -> 4]]

2017-10-03 12:14:40,262 INFO  [streaming-job-executor-0] reader.KafkaConsumer (KafkaConsumer.java:lambda$initStream$2b398ea6$1(73)) - [OffsetRange(topic: 'raul-2', partition: 1, range: [4 -> 4]), OffsetRange(topic: 'raul-2', partition: 0, range: [4 -> 4])]

# New log! [OffsetRange(topic: 'raul-2', partition: 1, range: [4 -> 5]]

2017-10-03 12:14:50,269 INFO  [streaming-job-executor-0] reader.KafkaConsumer (KafkaConsumer.java:lambda$initStream$2b398ea6$1(73)) - [OffsetRange(topic: 'raul-2', partition: 1, range: [4 -> 5]), OffsetRange(topic: 'raul-2', partition: 0, range: [4 -> 4])]

2017-10-03 12:14:52,855 WARN  [Executor task launch worker for task 2] writer.S3Uploader (Worker.java:multipartUpload(31)) - Failed Bundle{data = fail }

# Processing fails so I don't want to mark this as processed

2017-10-03 12:14:52,855 INFO  [Executor task launch worker for task 2] reader.KafkaConsumer (KafkaConsumer.java:lambda$null$46bbe1a2$1(94)) - Batch unsuccessful. Creating new offsets.

# Go back to init state [OffsetRange(topic: 'raul-2', partition: 1, range: [4 -> 4]]

2017-10-03 12:14:52,859 INFO  [streaming-job-executor-0] reader.KafkaConsumer (KafkaConsumer.java:lambda$initStream$2b398ea6$1(105)) - NEW OFFSETS: [OffsetRange(topic: 'raul-2', partition: 1, range: [4 -> 4]), OffsetRange(topic: 'raul-2', partition: 0, range: [4 -> 4])]

# Offsets read in next batch [OffsetRange(topic: 'raul-2', partition: 1, range: [5 -> 5]] ??

lambda$initStream$2b398ea6$1(73)) - [OffsetRange(topic: 'raul-2', partition: 1, range: [5 -> 5]), OffsetRange(topic: 'raul-2', partition: 0, range: [4 -> 4])]

0 个答案:

没有答案