Question

我想在3节点设置中以分布式模式运行kafka connect。但是发生了TimeoutException，甚至没有发布我的连接器。即使我在TimeoutException之前发布了一个连接器配置，它也没有到达端点，并且在REST服务器停止后返回了一个Request timed out响应。我正在其中一个节点上运行以下命令：
bin/connect-distributed etc/kafka/connect-distributed.properties
日志：

.
.
[2019-01-30 10:16:21,126] INFO Started o.e.j.s.ServletContextHandler@5fed9976{/,null,AVAILABLE} (org.eclipse.jetty.server.handler.ContextHandler:850)
[2019-01-30 10:16:21,134] INFO Started http_8083@15d3793b{HTTP/1.1,[http/1.1]}{0.0.0.0:8083} (org.eclipse.jetty.server.AbstractConnector:292)
[2019-01-30 10:16:21,135] INFO Started @18558ms (org.eclipse.jetty.server.Server:408)
[2019-01-30 10:16:21,136] INFO Advertised URI: http://127.0.1.1:8083/ (org.apache.kafka.connect.runtime.rest.RestServer:267)
[2019-01-30 10:16:21,136] INFO REST server listening at http://127.0.1.1:8083/, advertising URL http://127.0.1.1:8083/ (org.apache.kafka.connect.runtime.rest.RestServer:217)
[2019-01-30 10:16:21,136] INFO Kafka Connect started (org.apache.kafka.connect.runtime.Connect:55)
[2019-01-30 10:17:20,594] ERROR Uncaught exception in herder work thread, exiting:  (org.apache.kafka.connect.runtime.distributed.DistributedHerder:228)
org.apache.kafka.common.errors.TimeoutException: Timeout of 60000ms expired before the position for partition connect-offsets-41 could be determined
[2019-01-30 10:17:20,620] INFO Kafka Connect stopping (org.apache.kafka.connect.runtime.Connect:65)
[2019-01-30 10:17:20,620] INFO Stopping REST server (org.apache.kafka.connect.runtime.rest.RestServer:223)
.
.

connect-distributed.properties ：

bootstrap.servers=localhost:9092
group.id=connect-cluster
key.converter.schemas.enable=true
value.converter.schemas.enable=true
offset.storage.topic=connect-offsets
offset.storage.replication.factor=3
config.storage.topic=connect-configs
config.storage.replication.factor=3
status.storage.topic=connect-status
status.storage.replication.factor=3
offset.flush.interval.ms=10000
plugin.path=share/java,/root/confluent-5.1.0/share/confluent-hub-components

zookeeper.properties ：

dataDir=/var/zookeeper
clientPort=2181
maxClientCnxns=0 
tickTime=2000
server.1=current:2888:3888 # 0.0.0.0
server.2=kfk-2:2888:3888 #public ip
server.3=kfk-3:2888:3888 #public ip
initLimit=20
syncLimit=10

server.properties ：

broker.id=1 #for node-1
advertised.listeners=PLAINTEXT://<public-ip>:9092
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/var/kafka-logs
num.partitions=1
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
zookeeper.connect=kfkp-1:2181,kfkp-2:2181,kfkp-3:2181 #private ips
zookeeper.connection.timeout.ms=6000

在服务器和Zookeeper配置之上，我能够使用生产者在所有节点上复制主题上发布的消息，以及必须在消费者命令中使用--partition 0的捕获。服务器日志中也有许多WARN消息。

配置是否有问题？为何REST服务器给出TimeoutException？

更新：

描述“连接状态：

Topic:connect-status    PartitionCount:10   ReplicationFactor:3 Configs:cleanup.policy=compact  
Topic: connect-status   Partition: 0    Leader: 3   Replicas: 3,1,2 Isr: 3,2,1  
Topic: connect-status   Partition: 1    Leader: 3   Replicas: 1,2,3 Isr: 3,2,1  
Topic: connect-status   Partition: 2    Leader: 3   Replicas: 2,3,1 Isr: 3,2,1  
Topic: connect-status   Partition: 3    Leader: 3   Replicas: 3,2,1 Isr: 3,2,1  
Topic: connect-status   Partition: 4    Leader: 3   Replicas: 1,3,2 Isr: 3,2,1  
Topic: connect-status   Partition: 5    Leader: 3   Replicas: 2,1,3 Isr: 3,2,1  
Topic: connect-status   Partition: 6    Leader: 3   Replicas: 3,1,2 Isr: 3,2,1  
Topic: connect-status   Partition: 7    Leader: 3   Replicas: 1,2,3 Isr: 3,2,1  
Topic: connect-status   Partition: 8    Leader: 3   Replicas: 2,3,1 Isr: 3,2,1  
Topic: connect-status   Partition: 9    Leader: 3   Replicas: 3,2,1 Isr: 3,2,1

描述“连接配置”：

Topic:connect-configs   PartitionCount:1    ReplicationFactor:3 Configs:cleanup.policy=compact
Topic: connect-configs  Partition: 0    Leader: 1   Replicas: 1,2,3 Isr: 3,2,1

描述“连接偏移量”：

Topic:connect-offsets   PartitionCount:50   ReplicationFactor:3 Configs:cleanup.policy=compact
Topic: connect-offsets  Partition: 0    Leader: 1   Replicas: 1,2,3 Isr: 3,2,1  
Topic: connect-offsets  Partition: 1    Leader: 2   Replicas: 2,3,1 Isr: 3,2,1  
Topic: connect-offsets  Partition: 2    Leader: 3   Replicas: 3,1,2 Isr: 3,2,1  
Topic: connect-offsets  Partition: 3    Leader: 1   Replicas: 1,3,2 Isr: 3,2,1  
Topic: connect-offsets  Partition: 4    Leader: 2   Replicas: 2,1,3 Isr: 3,2,1  
Topic: connect-offsets  Partition: 5    Leader: 3   Replicas: 3,2,1 Isr: 3,2,1  
Topic: connect-offsets  Partition: 6    Leader: 1   Replicas: 1,2,3 Isr: 3,2,1  
Topic: connect-offsets  Partition: 7    Leader: 2   Replicas: 2,3,1 Isr: 3,2,1  
.
.  
.
.
Topic: connect-offsets  Partition: 47   Leader: 3   Replicas: 3,2,1 Isr: 3,2,1  
Topic: connect-offsets  Partition: 48   Leader: 1   Replicas: 1,2,3 Isr: 3,2,1  
Topic: connect-offsets  Partition: 49   Leader: 2   Replicas: 2,3,1 Isr: 3,2,1

一些服务器日志中有WARN，我不了解它的含义：

.  
.
.
[2019-01-31 04:21:24,720] WARN [ReplicaFetcher replicaId=2, leaderId=3, fetcherId=0] Based on replica's leader epoch, leader replied with an unknown offset in connect-offsets-26. The initial fetch offset 0 will be used for truncation. (kafka.server.ReplicaFetcherThread)
[2019-01-31 04:21:24,721] INFO [Log partition=connect-offsets-26, dir=/var/kafka-logs] Truncating to 0 has no effect as the largest offset in the log is -1 (kafka.log.Log)
[2019-01-31 04:21:24,721] INFO [Log partition=y-1, dir=/var/kafka-logs] Truncating to 3 has no effect as the largest offset in the log is 2 (kafka.log.Log)
[2019-01-31 04:21:24,721] WARN [ReplicaFetcher replicaId=2, leaderId=3, fetcherId=0] Based on replica's leader epoch, leader replied with an unknown offset in connect-status-2. The initial fetch offset 0 will be used for truncation. (kafka.server.ReplicaFetcherThread)
[2019-01-31 04:21:24,721] INFO [Log partition=connect-status-2, dir=/var/kafka-logs] Truncating to 0 has no effect as the largest offset in the log is -1 (kafka.log.Log)
[2019-01-31 04:21:24,721] WARN [ReplicaFetcher replicaId=2, leaderId=3, fetcherId=0] Based on replica's leader epoch, leader replied with an unknown offset in connect-offsets-49. The initial fetch offset 0 will be used for truncation. (kafka.server.ReplicaFetcherThread)
[2019-01-31 04:21:24,721] INFO [Log partition=connect-offsets-49, dir=/var/kafka-logs] Truncating to 0 has no effect as the largest offset in the log is -1 (kafka.log.Log)
[2019-01-31 04:21:24,722] WARN [ReplicaFetcher replicaId=2, leaderId=3, fetcherId=0] Based on replica's leader epoch, leader replied with an unknown offset in connect-offsets-47. The initial fetch offset 0 will be used for truncation. (kafka.server.ReplicaFetcherThread)
[2019-01-31 04:21:24,722] INFO [Log partition=connect-offsets-47, dir=/var/kafka-logs] Truncating to 0 has no effect as the largest offset in the log is -1 (kafka.log.Log)
.
.
.
.
[2019-01-31 04:21:34,371] WARN [Producer clientId=producer-1] Error while fetching metadata with correlation id 98 : {__confluent.support.metrics=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
[2019-01-31 04:21:34,476] WARN [Producer clientId=producer-1] Error while fetching metadata with correlation id 99 : {__confluent.support.metrics=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
[2019-01-31 04:21:34,492] INFO [Producer clientId=producer-1] Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms. (org.apache.kafka.clients.producer.KafkaProducer)
[2019-01-31 04:21:34,502] ERROR Failed to submit metrics to Kafka topic __confluent.support.metrics (due to exception): java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 10000 ms. (io.confluent.support.metrics.submitters.KafkaSubmitter)
[2019-01-31 04:21:35,997] INFO Successfully submitted metrics to Confluent via secure endpoint (io.confluent.support.metrics.submitters.ConfluentSubmitter)
.
.
.

无法为分布式模式下的kafka-connect启动REST Server

0 个答案: