我使用Kafka直接流来读取Kafka的日志。
KafkaUtils.createDirectStream(streamingContext, LocationStrategies.PreferConsistent(), ConsumerStrategies.<String, String>Subscribe(topics, kafkaParams));
并且,在处理过程中,如果任何分区中的处理有任何问题,我会创建新的偏移量,以便日志显示为未处理,我可以重试(或者我的想法)
以下是该部分的代码:
dStream.foreachRDD((JavaRDD<ConsumerRecord<String, String>> rdd) -> {
OffsetRange[] offsets = ((HasOffsetRanges) rdd.rdd()).offsetRanges();
LOG.info(Arrays.toString(offsets));
if (!rdd.isEmpty()) {
JavaRDD<OffsetRange> rddUploadResult = rdd.mapPartitions((FlatMapFunction<Iterator<ConsumerRecord<String, String>>, OffsetRange>) iterator -> {
// Some processing...
Result result = worker.call();
OffsetRange partitionOffset = offsets[TaskContext.getPartitionId()];
if (result.isSuccessful()) {
LOG.info("Batch successful. Committing offsets.");
return Collections.singletonList(partitionOffset).iterator();
} else {
LOG.info("Batch unsuccessful. Creating new offsets.");
return Collections.singletonList(OffsetRange.create(partitionOffset.topicPartition(), partitionOffset.fromOffset(), partitionOffset.fromOffset())).iterator();
}
});
OffsetRange[] newOffsets = rddUploadResult.collect().toArray(new OffsetRange[0]);
LOG.info("NEW OFFSETS: {}", Arrays.toString(newOffsets));
// Update offsets
((CanCommitOffsets) dStream.inputDStream()).commitAsync(newOffsets);
} else {
LOG.debug("Nothing to do...");
}
});
但是当我运行它时,这是输出:
# Init state [OffsetRange(topic: 'raul-2', partition: 1, range: [4 -> 4]]
2017-10-03 12:14:40,262 INFO [streaming-job-executor-0] reader.KafkaConsumer (KafkaConsumer.java:lambda$initStream$2b398ea6$1(73)) - [OffsetRange(topic: 'raul-2', partition: 1, range: [4 -> 4]), OffsetRange(topic: 'raul-2', partition: 0, range: [4 -> 4])]
# New log! [OffsetRange(topic: 'raul-2', partition: 1, range: [4 -> 5]]
2017-10-03 12:14:50,269 INFO [streaming-job-executor-0] reader.KafkaConsumer (KafkaConsumer.java:lambda$initStream$2b398ea6$1(73)) - [OffsetRange(topic: 'raul-2', partition: 1, range: [4 -> 5]), OffsetRange(topic: 'raul-2', partition: 0, range: [4 -> 4])]
2017-10-03 12:14:52,855 WARN [Executor task launch worker for task 2] writer.S3Uploader (Worker.java:multipartUpload(31)) - Failed Bundle{data = fail }
# Processing fails so I don't want to mark this as processed
2017-10-03 12:14:52,855 INFO [Executor task launch worker for task 2] reader.KafkaConsumer (KafkaConsumer.java:lambda$null$46bbe1a2$1(94)) - Batch unsuccessful. Creating new offsets.
# Go back to init state [OffsetRange(topic: 'raul-2', partition: 1, range: [4 -> 4]]
2017-10-03 12:14:52,859 INFO [streaming-job-executor-0] reader.KafkaConsumer (KafkaConsumer.java:lambda$initStream$2b398ea6$1(105)) - NEW OFFSETS: [OffsetRange(topic: 'raul-2', partition: 1, range: [4 -> 4]), OffsetRange(topic: 'raul-2', partition: 0, range: [4 -> 4])]
# Offsets read in next batch [OffsetRange(topic: 'raul-2', partition: 1, range: [5 -> 5]] ??
lambda$initStream$2b398ea6$1(73)) - [OffsetRange(topic: 'raul-2', partition: 1, range: [5 -> 5]), OffsetRange(topic: 'raul-2', partition: 0, range: [4 -> 4])]