我有两个卡夫卡(2.11-0.11.0.1)经纪人。主题的默认复制因子设置为2。生产者仅将数据写入零分区。
并且我安排了执行程序,该执行程序定期运行任务。当它消耗的主题每分钟只有少量记录(每分钟100条)时,其魅力就很大。但是对于庞大的主题(每分钟10K),投票方法不会返回任何数据。
任务是:
import org.apache.kafka.clients.consumer.Consumer;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.TopicPartition;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.time.Duration;
import java.util.Collections;
import java.util.Properties;
public final class TopicToDbPump implements Runnable {
private static final Logger log = LoggerFactory.getLogger(TopicToDbPump.class);
private final String topic;
private final TopicPartition topicPartition;
private final Properties properties;
public TopicToDbPump(String topic, Properties properties) {
this.topic = topic;
topicPartition = new TopicPartition(topic, 0);
this.properties = properties;
}
@Override
public void run() {
try (final Consumer<String, String> consumer = new KafkaConsumer<>(properties)) {
consumer.assign(Collections.singleton(topicPartition));
final long offset = readOffsetFromDb(topic);
consumer.seek(topicPartition, offset);
final ConsumerRecords<String, String> records = consumer.poll(Duration.ofSeconds(1));
if (records.isEmpty()) {
log.debug("No data from topic " + topic + " available");
return;
}
saveData(records.records(topic));
} catch (Throwable t) {
log.error("Etl process " + topic + " failed with exception", t);
}
}
}
使用者的参数为:
"bootstrap.servers" = "host-1:9092,host-2:9092",
"group.id" = "my-group",
"enable.auto.commit" = "false",
"key.deserializer" = "org.apache.kafka.common.serialization.StringDeserializer",
"value.deserializer" = "org.apache.kafka.common.serialization.StringDeserializer",
"max.partition.fetch.bytes": "50000000",
"max.poll.records": "10000"
怎么了?
答案 0 :(得分:0)
Kafka Consumer API不保证对poll()
的首次调用将返回任何数据。
使用者首先必须连接到集群,发现分配给它的所有分区的领导者。正如您想象的那样,这可能需要花费几秒钟的时间,因此不太可能立即收到数据。
如果首先没有返回任何数据,则应该多次调用poll()
。