我已经实现了循环分区程序,如下所示:
public class KafkaRoundRobinPartitioner implements Partitioner {
private static final Logger log = Logger.getLogger(KafkaRoundRobinPartitioner.class);
final AtomicInteger counter = new AtomicInteger(0);
public KafkaRoundRobinPartitioner() {}
@Override
public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
int partitionsCount = partitions.size();
int partitionId = counter.incrementAndGet() % partitionsCount;
if (counter.get() > 65536) {
counter.set(partitionId);
}
return partitionId;
}
@Override
public void close() {
}
@Override
public void configure(Map<String, ?> map) {
}
}
现在我想测试每个分区是否有相同数量的消息。例如,如果我有一个带有32个分区的主题,并且我向该主题发送了32条消息,我希望每个分区都有1条消息。
我想做以下事情:
KafkaPartitions allPartitions = new KafkaTopic("topic_name");
for (KafkaPartition partition : allPartitions) {
int msgCount = partition.getMessagesCount();
// do asserts
}
据我所知,Kafka Java API没有为我们提供此类功能,但我可能在文档中丢失了一些内容。
有没有办法优雅地实现它?
更新 我找到了一个基本的解决方案。由于我使用的是多消费者模型,我可以为每个消费者做以下事情:
consumer.assignment().size();
之后我可以这样做:
consumer.poll(100);
并检查每个消费者是否有消息。在这种情况下,我不应该面对一个消费者从其分区获取另一个消息的情况,因为由于我拥有相同数量的消费者和分区,Kafka应该以循环方式在消费者之间分配分区。
答案 0 :(得分:0)
您可以使用seekToBeginning()
和seekToEnd()
来计算每个分区获得的偏移量的差异。
答案 1 :(得分:0)
最后,我写了类似下面的内容。
我的KafkaConsumer工作人员拥有以下代码:
public void run() {
while (keepProcessing) {
try {
ConsumerRecords<byte[], byte[]> records = consumer.poll(100);
for (ConsumerRecord<byte[], byte[]> record : records) {
// do processing
consumer.commitSync();
}
} catch (Exception e) {
logger.error("Couldn't process message", e);
}
}
}
在我的测试中,我决定检查每个消费者是否只进行了一次提交,这意味着消息分发采用循环方式。测试代码:
public class KafkaIntegrationTest {
private int consumersAndPartitionsNumber;
private final CountDownLatch latch = new CountDownLatch(consumersAndPartitionsNumber);
@Test
public void testPartitions() throws Exception {
consumersAndPartitionsNumber = Config.getConsumerThreadAmount(); // it's 5
KafkaMessageQueue kafkaMessageQueue = new KafkaMessageQueue(); // just a class with Producer configuration
String groupId = Config.getGroupId();
List<KafkaConsumer<byte[], byte[]>> consumers = new ArrayList<>(consumersAndPartitionsNumber);
for (int i = 0; i < consumersAndPartitionsNumber; i++) {
consumers.add(spy(new KafkaConsumer<>(KafkaManager.createKafkaConsumerConfig(groupId))));
}
ExecutorService executor = Executors.newFixedThreadPool(consumersAndPartitionsNumber);
for (KafkaConsumer<byte[], byte[]> consumer : consumers) {
executor.submit(new TestKafkaWorker(consumer));
}
for (int i = 0; i < consumersAndPartitionsNumber; i++) {
// send messages to topic
kafkaMessageQueue.send(new PostMessage("pageid", "channel", "token", "POST", null, "{}"));
}
latch.await(60, TimeUnit.SECONDS);
for (KafkaConsumer<byte[], byte[]> consumer : consumers) {
verify(consumer).commitSync();
}
}
class TestKafkaWorker implements Runnable {
private final KafkaConsumer<byte[], byte[]> consumer;
private boolean keepProcessing = true;
TestKafkaWorker(KafkaConsumer<byte[], byte[]> consumer) {
this.consumer = consumer;
consumer.subscribe(Arrays.asList(Config.getTaskProcessingTopic()));
}
public void run() {
while (keepProcessing) {
try {
ConsumerRecords<byte[], byte[]> records = consumer.poll(100);
for (ConsumerRecord<byte[], byte[]> record : records) {
consumer.commitSync();
keepProcessing = false;
latch.countDown();
}
} catch (Exception e) {
}
}
}
}
}