如何在分布式模式下部署kafka connect?

时间:2019-08-28 20:27:51

标签: distributed apache-kafka-connect

我正在使用kubernetes中的JDBC接收器连接器来构建Kafka-connect应用程序。我尝试了独立模式,它正在工作。我想转到分布式模式。 我可以通过运行以下yaml文件成功建立两个Pod(kafka连接器):

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  namespace: vtq
  name: kafka-sink-postgres-dis
spec:
  replicas: 2
  template:
    metadata:
      labels:
        app: kafka-sink-postgres-dis
    spec:
      containers:
      - name: kafka-sink-postgres-dis
        image: ***
        imagePullPolicy: Always

bin / connect-distributed.sh config / worker.properties

bootstrap.servers=***:9092
offset.flush.interval.ms=10000

rest.port=8083
rest.host.name=127.0.0.1


key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=io.confluent.connect.avro.AvroConverter
value.converter.schema.registry.url=http://schema-registry:8081


# Prevent the connector from pulling all historical messages
auto.offset.reset=latest

# options below may be required for distributed mode

# unique name for the cluster, used in forming the Connect cluster group. Note that this must not conflict with consumer group IDs
group.id=connect-postgres-sink-dis

# Topic to use for storing offsets. This topic should have many partitions and be replicated.
offset.storage.topic=postgres-connect-offsets
offset.storage.replication.factor=3

# Topic to use for storing connector and task configurations; note that this should be a single partition, highly replicated topic.
# You MUST manually create the topic to ensure single partition for the config topic as auto created topics may have multiple partitions.

config.storage.topic=postgres-connect-configs
config.storage.replication.factor=3

# Topic to use for storing statuses. This topic can have multiple partitions and should be replicated.
status.storage.topic=postgres-connect-status
status.storage.replication.factor=3

并在每个吊舱内使用task.max = 1创建一个接收器连接器,两个连接器收听相同的主题。原来是它们只是重复而已。

curl -X POST -H "Content-Type: application/json" --data '{"name": "postgres_sink_dis", "config": {"connector.class":"io.confluent.connect.jdbc.JdbcSinkConnector", "tasks.max":"1", "connection.url":"***","topics":"***"}}' http://127.0.0.1:8083/connectors

但是我对kafka connect cluster,工作者,连接器和任务的概念感到困惑。我从 https://github.com/enfuse/kafka-connect-demo/blob/master/docs/install-connector.md。 他们在配置连接器之前将端口转发到其余端口。我尝试了一下,在部署服务并创建了连接器之后,curl -s 172.0.0.1:8083/connectors没有任何回报。

谁能给我一个简短的描述,我下一步该怎么做,任何相关信息都将非常有帮助。谢谢!

更新: 最后,我找出了问题并解决了问题。 1.使用相同的group.id和不同的rest.port(https://docs.confluent.io/current/connect/userguide.html)分别部署两个Pod / worker。 2.在窗格中,创建一个包含任务的连接器。

1 个答案:

答案 0 :(得分:0)

例如,您有一个由两个worker /两个pod组成的连接器集群。您可以在集群中创建具有多个任务的连接器(接收器或源),这些任务将分布在两个工作器中。