无法理解分布式模式下的Kafka Connect

时间:2020-02-01 13:38:05

标签: apache-kafka apache-kafka-connect

我以独立模式(如下所示)开始Kafka连接

/usr/local/confluent/bin/connect-standalone /usr/local/confluent/etc/kafka/connect-standalone.properties /usr/local/confluent/etc/kafka-connect-elasticsearch/quickstart-elasticsearch.properties

之后,我使用rest API创建了一个包含所有详细信息的连接器。 像这样

curl  -X POST -H "Content-Type: application/json" --data '{"name":"elastic-search-sink-audit","config":{"connector.class":"io.confluent.connect.elasticsearch.ElasticsearchSinkConnector","tasks.max":"5","topics":"fsp-AUDIT_EVENT_DEMO","key.ignore":"true","connection.url":"https://**.amazonaws.com","type.name":"kafka-connect-distributed","name":"elastic-search-sink-audit","errors.tolerance":"all","errors.deadletterqueue.topic.name":"fsp-dlq-audit-event"}}' http://localhost:8083/connectors | jq

之后,当我检查状态时,我可以看到5个任务正在运行

curl  localhost:8083/connectors/elastic-search-sink-audit/tasks | jq

问题1:

这是否意味着我在分布式模式或仅在独立模式下运行我的kafka连接器?

问题2:

我是否需要修改connect-distributed.properties文件并像独立运行一样开始?

问题3:

当前,我仅在一个EC2中运行所有设置,现在如果我必须再添加5个EC2以使连接器更加并行并加快速度,我如何才能使kafka connect如何理解已添加5个EC2并它必须分担工作量吗?

问题4: 我是否必须在所有ec2中运行并启动并创建kafka connect才能启动?我如何确认所有5个EC2都在同一连接器上正常运行。

最后,我尝试在分布式模式下启动连接器。 首先,我是这样开始的

/usr/local/confluent/bin/connect-distributed /usr/local/confluent/etc/kafka/connect-distributed.properties /usr/local/confluent/etc/kafka-connect-elasticsearch/quickstart-elasticsearch.properties

然后在另一个使用Rest API的会话中

curl  -X POST -H "Content-Type: application/json" --data '{"name":"elastic-search-sink-audit","config":{"connector.class":"io.confluent.connect.elasticsearch.ElasticsearchSinkConnector","tasks.max":"5","topics":"fsp-AUDIT_EVENT_DEMO","key.ignore":"true","connection.url":"https://**.amazonaws.com","type.name":"kafka-connect-distributed","name":"elastic-search-sink-audit","errors.tolerance":"all","errors.deadletterqueue.topic.name":"fsp-dlq-audit-event"}}' http://localhost:8083/connectors | jq

但是一旦击中,我就开始出现这样的错误

rror: NOT_ENOUGH_REPLICAS (org.apache.kafka.clients.producer.internals.Sender:598)
[2020-02-01 13:48:15,551] WARN [Producer clientId=producer-3] Got error produce response with correlation id 159 on topic-partition connect-configs-0, retrying (2147483496 attempts left). Error: NOT_ENOUGH_REPLICAS (org.apache.kafka.clients.producer.internals.Sender:598)
[2020-02-01 13:48:15,652] WARN [Producer clientId=producer-3] Got error produce response with correlation id 160 on topic-partition connect-configs-0, retrying (2147483495 attempts left). Error: NOT_ENOUGH_REPLICAS (org.apache.kafka.clients.producer.internals.Sender:598)
[2020-02-01 13:48:15,753] WARN [Producer clientId=producer-3] Got error produce response with correlation id 161 on topic-partition connect-configs-0, retrying (2147483494 attempts left). Error: NOT_ENOUGH_REPLICAS (org.apache.kafka.clients.producer.internals.Sender:598)
[2020-02-01 13:48:15,854] WARN [Producer clientId=producer-3] Got error produce response with correlation id 162 on topic-partition connect-configs-0, retrying (2147483493 attempts left). Error: NOT_ENOUGH_REPLICAS (org.apache.kafka.clients.producer.internals.Sender:598)
[2020-02-01 13:48:15,956] WARN [Producer clientId=producer-3] Got error produce response with correlation id 163 on topic-partition connect-configs-0, retrying (2147483492 attempts left). Error: NOT_ENOUGH_REPLICAS (org.apache.kafka.clients.producer.internals.Sender:598)

当我尝试使用curl创建连接器时,最后请求超时

{ "error_code": 500, "message": "Request timed out" }

请帮助我理解这一点。

1 个答案:

答案 0 :(得分:2)

两种模式均启动REST API

分布式模式不接受连接器的属性文件,必须对其进行POST。您无需单独执行此操作,因为您从命令行提供的连接器已经在运行

建议使用分布式模式,因为连接器的状态会存储回Kafka主题中,而不是保存在运行独立模式的单台计算机上的文件中

有关更多详细信息,请参阅-Kafka Connect Concepts

kafka connect将如何理解增加了5个EC2并且必须共享工作负载?

我是否必须在所有ec2中运行并启动并创建kafka connect才能启动?如何确认所有5个EC2都在同一连接器上正常运行。

好吧,除非您的EC2机器是某个分布式集群的一部分,否则它们不知道启动任何进程,因此您必须使用相同的设置在每个分布式机器上启动分布式模式(Confluent的Ansible存储库使这非常容易)。 / p>

您可以点击任何Connect服务器的/ status端点以查看哪些地址正在运行哪些任务

NOT_ENOUGH_REPLICAS

因为您没有足够的代理来创建内部的Kafka Connect主题来跟踪状态