Kafks Consumer.poll不返回任何数据

时间:2019-03-04 16:59:36

标签: apache-kafka kafka-consumer-api

我有两个卡夫卡(2.11-0.11.0.1)经纪人。主题的默认复制因子设置为2。生产者仅将数据写入零分区。

并且我安排了执行程序,该执行程序定期运行任务。当它消耗的主题每分钟只有少量记录(每分钟100条)时,其魅力就很大。但是对于庞大的主题(每分钟10K),投票方法不会返回任何数据。

任务是:

import org.apache.kafka.clients.consumer.Consumer;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.TopicPartition;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.time.Duration;
import java.util.Collections;
import java.util.Properties;

public final class TopicToDbPump implements Runnable {
  private static final Logger log = LoggerFactory.getLogger(TopicToDbPump.class);
  private final String topic;
  private final TopicPartition topicPartition;
  private final Properties properties;

  public TopicToDbPump(String topic, Properties properties) {
    this.topic = topic;
    topicPartition = new TopicPartition(topic, 0);
    this.properties = properties;
  }

  @Override
  public void run() {
    try (final Consumer<String, String> consumer = new KafkaConsumer<>(properties)) {
      consumer.assign(Collections.singleton(topicPartition));
      final long offset = readOffsetFromDb(topic);
      consumer.seek(topicPartition, offset);
      final ConsumerRecords<String, String> records = consumer.poll(Duration.ofSeconds(1));
      if (records.isEmpty()) {
        log.debug("No data from topic " + topic + " available");
        return;
      }
      saveData(records.records(topic));
    } catch (Throwable t) {
      log.error("Etl process " + topic + " failed with exception", t);
    }
  }
}

使用者的参数为:

"bootstrap.servers" = "host-1:9092,host-2:9092",
"group.id" = "my-group",
"enable.auto.commit" = "false",
"key.deserializer" = "org.apache.kafka.common.serialization.StringDeserializer",
"value.deserializer" = "org.apache.kafka.common.serialization.StringDeserializer",
"max.partition.fetch.bytes": "50000000",
"max.poll.records": "10000"

怎么了?

1 个答案:

答案 0 :(得分:0)

Kafka Consumer API不保证对poll()的首次调用将返回任何数据。

使用者首先必须连接到集群,发现分配给它的所有分区的领导者。正如您想象的那样,这可能需要花费几秒钟的时间,因此不太可能立即收到数据。

如果首先没有返回任何数据,则应该多次调用poll()