我正在使用Kafka 0.10.0。在处理之前,我想知道分区中记录的大小。
在0.9.0.1版本中,我曾经使用下面的代码找到分区的public class Something<TA, TB> where TA: someConstraint where TB: someOtherConstraint
和latest
偏移量之间的差异。在新版本中,它在检索earliest
方法时会卡住。
consumer#position
上述调用的Stacktrace如下所示:
package org.apache.kafka.example.utils;
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Properties;
import org.apache.commons.lang3.Range;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.PartitionInfo;
import org.apache.kafka.common.TopicPartition;
import org.apache.kafka.common.serialization.ByteArraySerializer;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
public class FindTopicRange {
private static Logger logger = LogManager.getLogger();
public FindTopicRange() {
// TODO Auto-generated constructor stub
}
public static Map<TopicPartition, Range<Long>> getOffsets(String topic) {
Map<TopicPartition, Range<Long>> partitionToRange = new HashMap<>();
try (KafkaConsumer<byte[], byte[]> consumer = new KafkaConsumer<>(getConsumerConfigs())) {
List<TopicPartition> partitions = new ArrayList<>();
for (PartitionInfo partitionInfo : consumer.partitionsFor(topic)) {
partitions.add(new TopicPartition(partitionInfo.topic(), partitionInfo.partition()));
}
consumer.assign(partitions);
for (TopicPartition partition : partitions) {
consumer.seekToBeginning(Collections.singletonList(partition));
long earliestOffset = consumer.position(partition);
consumer.seekToEnd(Collections.singletonList(partition));
long latestOffset = consumer.position(partition);
partitionToRange.put(partition, Range.between(earliestOffset, latestOffset));
}
return partitionToRange;
} catch (Exception e) {
logger.error("Exception while getting offset range information for topic : {}", topic, e);
}
return partitionToRange;
}
private static Properties getConsumerConfigs() {
Properties configs = new Properties();
configs.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
configs.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
configs.put(ConsumerConfig.CLIENT_ID_CONFIG, "test");
configs.put(ConsumerConfig.MAX_PARTITION_FETCH_BYTES_CONFIG, 10240);
configs.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, ByteArraySerializer.class.getCanonicalName());
configs.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, ByteArraySerializer.class.getCanonicalName());
return configs;
}
public static void main(String[] args) {
System.out.println(getOffsets("hello"));
}
}
答案 0 :(得分:2)
我能够让您的示例在scala中工作(已经在处理类似的代码)。我做的唯一补充是在代码中添加了consumer.poll,因为consumer.subscribe和consumer.assign都是懒惰的。
val partitions = new util.ArrayList[TopicPartition]
for (partitionInfo <- consumer.partitionsFor(topic)) {
partitions.add(new TopicPartition(partitionInfo.topic, partitionInfo.partition))}
val recordTemp = consumer.poll(1000)
for (partition <- partitions) {
consumer.seekToBeginning(Collections.singletonList(partition))
println(consumer.position(partition))
consumer.seekToEnd(Collections.singletonList(partition))
println(consumer.position(partition))
}
答案 1 :(得分:0)
您是否尝试过使用新的消费者群体假冒它?这篇文章显示它可以为您提供滞后值set