我想在3节点设置中以分布式模式运行kafka connect。但是发生了TimeoutException
,甚至没有发布我的连接器。即使我在TimeoutException
之前发布了一个连接器配置,它也没有到达端点,并且在REST服务器停止后返回了一个Request timed out
响应。
我正在其中一个节点上运行以下命令:
bin/connect-distributed etc/kafka/connect-distributed.properties
日志:
.
.
[2019-01-30 10:16:21,126] INFO Started o.e.j.s.ServletContextHandler@5fed9976{/,null,AVAILABLE} (org.eclipse.jetty.server.handler.ContextHandler:850)
[2019-01-30 10:16:21,134] INFO Started http_8083@15d3793b{HTTP/1.1,[http/1.1]}{0.0.0.0:8083} (org.eclipse.jetty.server.AbstractConnector:292)
[2019-01-30 10:16:21,135] INFO Started @18558ms (org.eclipse.jetty.server.Server:408)
[2019-01-30 10:16:21,136] INFO Advertised URI: http://127.0.1.1:8083/ (org.apache.kafka.connect.runtime.rest.RestServer:267)
[2019-01-30 10:16:21,136] INFO REST server listening at http://127.0.1.1:8083/, advertising URL http://127.0.1.1:8083/ (org.apache.kafka.connect.runtime.rest.RestServer:217)
[2019-01-30 10:16:21,136] INFO Kafka Connect started (org.apache.kafka.connect.runtime.Connect:55)
[2019-01-30 10:17:20,594] ERROR Uncaught exception in herder work thread, exiting: (org.apache.kafka.connect.runtime.distributed.DistributedHerder:228)
org.apache.kafka.common.errors.TimeoutException: Timeout of 60000ms expired before the position for partition connect-offsets-41 could be determined
[2019-01-30 10:17:20,620] INFO Kafka Connect stopping (org.apache.kafka.connect.runtime.Connect:65)
[2019-01-30 10:17:20,620] INFO Stopping REST server (org.apache.kafka.connect.runtime.rest.RestServer:223)
.
.
connect-distributed.properties :
bootstrap.servers=localhost:9092
group.id=connect-cluster
key.converter.schemas.enable=true
value.converter.schemas.enable=true
offset.storage.topic=connect-offsets
offset.storage.replication.factor=3
config.storage.topic=connect-configs
config.storage.replication.factor=3
status.storage.topic=connect-status
status.storage.replication.factor=3
offset.flush.interval.ms=10000
plugin.path=share/java,/root/confluent-5.1.0/share/confluent-hub-components
zookeeper.properties :
dataDir=/var/zookeeper
clientPort=2181
maxClientCnxns=0
tickTime=2000
server.1=current:2888:3888 # 0.0.0.0
server.2=kfk-2:2888:3888 #public ip
server.3=kfk-3:2888:3888 #public ip
initLimit=20
syncLimit=10
server.properties :
broker.id=1 #for node-1
advertised.listeners=PLAINTEXT://<public-ip>:9092
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/var/kafka-logs
num.partitions=1
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
zookeeper.connect=kfkp-1:2181,kfkp-2:2181,kfkp-3:2181 #private ips
zookeeper.connection.timeout.ms=6000
在服务器和Zookeeper配置之上,我能够使用生产者在所有节点上复制主题上发布的消息,以及必须在消费者命令中使用--partition 0
的捕获。
服务器日志中也有许多WARN
消息。
配置是否有问题?为何REST服务器给出TimeoutException
?
更新:
描述“连接状态:
Topic:connect-status PartitionCount:10 ReplicationFactor:3 Configs:cleanup.policy=compact
Topic: connect-status Partition: 0 Leader: 3 Replicas: 3,1,2 Isr: 3,2,1
Topic: connect-status Partition: 1 Leader: 3 Replicas: 1,2,3 Isr: 3,2,1
Topic: connect-status Partition: 2 Leader: 3 Replicas: 2,3,1 Isr: 3,2,1
Topic: connect-status Partition: 3 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1
Topic: connect-status Partition: 4 Leader: 3 Replicas: 1,3,2 Isr: 3,2,1
Topic: connect-status Partition: 5 Leader: 3 Replicas: 2,1,3 Isr: 3,2,1
Topic: connect-status Partition: 6 Leader: 3 Replicas: 3,1,2 Isr: 3,2,1
Topic: connect-status Partition: 7 Leader: 3 Replicas: 1,2,3 Isr: 3,2,1
Topic: connect-status Partition: 8 Leader: 3 Replicas: 2,3,1 Isr: 3,2,1
Topic: connect-status Partition: 9 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1
描述“连接配置”:
Topic:connect-configs PartitionCount:1 ReplicationFactor:3 Configs:cleanup.policy=compact
Topic: connect-configs Partition: 0 Leader: 1 Replicas: 1,2,3 Isr: 3,2,1
描述“连接偏移量”:
Topic:connect-offsets PartitionCount:50 ReplicationFactor:3 Configs:cleanup.policy=compact
Topic: connect-offsets Partition: 0 Leader: 1 Replicas: 1,2,3 Isr: 3,2,1
Topic: connect-offsets Partition: 1 Leader: 2 Replicas: 2,3,1 Isr: 3,2,1
Topic: connect-offsets Partition: 2 Leader: 3 Replicas: 3,1,2 Isr: 3,2,1
Topic: connect-offsets Partition: 3 Leader: 1 Replicas: 1,3,2 Isr: 3,2,1
Topic: connect-offsets Partition: 4 Leader: 2 Replicas: 2,1,3 Isr: 3,2,1
Topic: connect-offsets Partition: 5 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1
Topic: connect-offsets Partition: 6 Leader: 1 Replicas: 1,2,3 Isr: 3,2,1
Topic: connect-offsets Partition: 7 Leader: 2 Replicas: 2,3,1 Isr: 3,2,1
.
.
.
.
Topic: connect-offsets Partition: 47 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1
Topic: connect-offsets Partition: 48 Leader: 1 Replicas: 1,2,3 Isr: 3,2,1
Topic: connect-offsets Partition: 49 Leader: 2 Replicas: 2,3,1 Isr: 3,2,1
一些服务器日志中有WARN
,我不了解它的含义:
.
.
.
[2019-01-31 04:21:24,720] WARN [ReplicaFetcher replicaId=2, leaderId=3, fetcherId=0] Based on replica's leader epoch, leader replied with an unknown offset in connect-offsets-26. The initial fetch offset 0 will be used for truncation. (kafka.server.ReplicaFetcherThread)
[2019-01-31 04:21:24,721] INFO [Log partition=connect-offsets-26, dir=/var/kafka-logs] Truncating to 0 has no effect as the largest offset in the log is -1 (kafka.log.Log)
[2019-01-31 04:21:24,721] INFO [Log partition=y-1, dir=/var/kafka-logs] Truncating to 3 has no effect as the largest offset in the log is 2 (kafka.log.Log)
[2019-01-31 04:21:24,721] WARN [ReplicaFetcher replicaId=2, leaderId=3, fetcherId=0] Based on replica's leader epoch, leader replied with an unknown offset in connect-status-2. The initial fetch offset 0 will be used for truncation. (kafka.server.ReplicaFetcherThread)
[2019-01-31 04:21:24,721] INFO [Log partition=connect-status-2, dir=/var/kafka-logs] Truncating to 0 has no effect as the largest offset in the log is -1 (kafka.log.Log)
[2019-01-31 04:21:24,721] WARN [ReplicaFetcher replicaId=2, leaderId=3, fetcherId=0] Based on replica's leader epoch, leader replied with an unknown offset in connect-offsets-49. The initial fetch offset 0 will be used for truncation. (kafka.server.ReplicaFetcherThread)
[2019-01-31 04:21:24,721] INFO [Log partition=connect-offsets-49, dir=/var/kafka-logs] Truncating to 0 has no effect as the largest offset in the log is -1 (kafka.log.Log)
[2019-01-31 04:21:24,722] WARN [ReplicaFetcher replicaId=2, leaderId=3, fetcherId=0] Based on replica's leader epoch, leader replied with an unknown offset in connect-offsets-47. The initial fetch offset 0 will be used for truncation. (kafka.server.ReplicaFetcherThread)
[2019-01-31 04:21:24,722] INFO [Log partition=connect-offsets-47, dir=/var/kafka-logs] Truncating to 0 has no effect as the largest offset in the log is -1 (kafka.log.Log)
.
.
.
.
[2019-01-31 04:21:34,371] WARN [Producer clientId=producer-1] Error while fetching metadata with correlation id 98 : {__confluent.support.metrics=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
[2019-01-31 04:21:34,476] WARN [Producer clientId=producer-1] Error while fetching metadata with correlation id 99 : {__confluent.support.metrics=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
[2019-01-31 04:21:34,492] INFO [Producer clientId=producer-1] Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms. (org.apache.kafka.clients.producer.KafkaProducer)
[2019-01-31 04:21:34,502] ERROR Failed to submit metrics to Kafka topic __confluent.support.metrics (due to exception): java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 10000 ms. (io.confluent.support.metrics.submitters.KafkaSubmitter)
[2019-01-31 04:21:35,997] INFO Successfully submitted metrics to Confluent via secure endpoint (io.confluent.support.metrics.submitters.ConfluentSubmitter)
.
.
.