我们正在使用Kafka版本:0.10.2.0
kafka流版本为1.1.0
在消费者认为是组协调器的kafka经纪人机器上,我们看到以下日志行::
2018-08-18 11:54:12,693 [kafka-request-handler-5] TRACE (Logging.scala:36) - [KafkaApi-48476987] Sending join group response {error_code=16,generation_id=0,group_protocol=,leader_id=,member_id=,members=[]} for correlation id 538 to client chitraguptaV1-dcf33a6c-368e-472e-aee7-120f1216aa3f-StreamThread-2-consumer.
并且由于NotCoordinatorException是可重试的异常,因此它会不断地重试,并且组协调器会不断向消费者客户端发送相同的错误代码(error_code = 16)
我们在kafka流实例侧看到以下日志行
Discovered group coordinator <groupcoordinator_host_name>:9092 (id: 2099006660 rack: null)
(Re-)joining group
Group coordinator <groupcoordinator_host_name>:9092 (id: 2099006660 rack: null) is unavailable or invalid, will attempt rediscovery
InitiateJoinGroup request failed This is not the correct coordinator.
请帮助我们解决此问题。
我们的Kafka集群配置:
delete.topic.enable: true
auto.create.topics.enable: true
unclean.leader.election.enable: false
controlled.shutdown.enable: true
controlled.shutdown.max.retries: 3
controlled.shutdown.retry.backoff.ms: 5000
default.replication.factor: 1
offsets.topic.num.partitions: 200
offsets.topic.replication.factor: 3
offsets.retention.check.interval.ms: 600000
offsets.commit.timeout.ms: 5000
num.network.threads: 3
num.replica.fetchers: 2
num.io.threads: 8
socket.send.buffer.bytes: 8388608
socket.receive.buffer.bytes: 8388608
socket.request.max.bytes: 314572800
log.retention.hours: 4
log.retention.bytes: 10737418240
log.segment.bytes: 536870912
log.cleanup.policy: delete
zookeeper.connection.timeout.ms: 6000
zookeeper.session.timeout.ms: 6000
zookeeper.sync.time.ms: 2000
queued.max.requests: 500
replica.lag.time.max: 10000
replica.fetch.wait.max.ms: 500
min.insync.replicas: 2
replica.fetch.max.bytes: 67108864
message.max.bytes: 67108864
replica.high.watermark.checkpoint.interval.ms: 5000
replica.socket.timeout.ms: 30000
replica.socket.receive.buffer.bytes: 65536
我们的Kafka Streams侧面配置:
props.setProperty(StreamsConfig.NUM_STANDBY_REPLICAS_CONFIG, "1");
props.setProperty(StreamsConfig.STATE_CLEANUP_DELAY_MS_CONFIG, "1800000");
props.put(StreamsConfig.DEFAULT_TIMESTAMP_EXTRACTOR_CLASS_CONFIG, WallclockTimestampExtractor.class.getName());
props.put(StreamsConfig.ROCKSDB_CONFIG_SETTER_CLASS_CONFIG, CustomRocksDBConfig.class);
props.put(StreamsConfig.NUM_STREAM_THREADS_CONFIG, "3");
props.put(StreamsConfig.REPLICATION_FACTOR_CONFIG, "2");
props.put(StreamsConfig.TOPIC_PREFIX + TopicConfig.CLEANUP_POLICY_CONFIG, TopicConfig.CLEANUP_POLICY_DELETE);
props.put(StreamsConfig.TOPIC_PREFIX + TopicConfig.RETENTION_MS_CONFIG, "43200000");
props.put(StreamsConfig.TOPIC_PREFIX + TopicConfig.COMPRESSION_TYPE_CONFIG, "lz4");
props.put(StreamsConfig.CONSUMER_PREFIX + ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, "240000");
props.put(StreamsConfig.REQUEST_TIMEOUT_MS_CONFIG, "300000");
props.put(StreamsConfig.RETRIES_CONFIG, "20");
props.put(StreamsConfig.RETRY_BACKOFF_MS_CONFIG, "2400");
props.put(ProducerConfig.BATCH_SIZE_CONFIG, "1048576");
props.put(ProducerConfig.LINGER_MS_CONFIG, "2400");
props.put(ProducerConfig.MAX_BLOCK_MS_CONFIG, "300000");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");