连接器转换不正确映射分区键

时间:2019-01-18 21:45:57

标签: apache-kafka apache-kafka-connect

我在MySQL中有一个简单的数据库表,其中有(id varchar(255), val varchar(255), ..., ...)

我设置了Kafka Connect,以将表流式传输到具有二十个分区的主题(CONNECT_TOPIC)。我还有一个主题(STREAM_TOPIC),由具有20个分区的kafka生产者填充。

问题是连接器上的键映射到CONNECTOR_TOPIC中的分区,而不是STREAM_TOPIC中的键。这意味着我不能同时加入两个主题。我相信这是因为ID提取错误。

以下是该输出的示例:

Stream Task ID 0_13 Partition number 13 Consumed CONNECT_EVENT 68f52084-cfc9-4997-a28e-57cfd4f7bbbf
Stream Task ID 0_13 Partition number 13 Consumed JOINED CONNECT_EVENT 68f52084-cfc9-4997-a28e-57cfd4f7bbbf
Stream Task ID 0_17 Partition number 17 Consumed STREAM_EVENT 68f52084-cfc9-4997-a28e-57cfd4f7bbbf
Stream Task ID 0_17 Partition number 17 Consumed JOINED STREAM_EVENT 68f52084-cfc9-4997-a28e-57cfd4f7bbbf

Stream Task ID 0_7 Partition number 7 Consumed STREAM_EVENT 32aaa88d-b175-4a54-8338-d542ed051e6a
Stream Task ID 0_7 Partition number 7 Consumed JOINED STREAM_EVENT 32aaa88d-b175-4a54-8338-d542ed051e6a
Stream Task ID 0_17 Partition number 17 Consumed CONNECT_EVENT 32aaa88d-b175-4a54-8338-d542ed051e6a
Stream Task ID 0_17 Partition number 17 Consumed JOINED CONNECT_EVENT 32aaa88d-b175-4a54-8338-d542ed051e6a

Stream Task ID 0_11 Partition number 11 Consumed CONNECT_EVENT 90265a93-adac-4e93-856c-d1498eeeb22e
Stream Task ID 0_11 Partition number 11 Consumed JOINED CONNECT_EVENT 90265a93-adac-4e93-856c-d1498eeeb22e
Stream Task ID 0_11 Partition number 11 Consumed STREAM_EVENT 90265a93-adac-4e93-856c-d1498eeeb22e
Stream Task ID 0_11 Partition number 11 Consumed JOINED STREAM_EVENT 90265a93-adac-4e93-856c-d1498eeeb22e
Stream Task ID 0_11 Partition number 11 Merged 90265a93-adac-4e93-856c-d1498eeeb22e

我尝试了以下连接器配置来转换ID:

        "name": "CONNECTOR",
        "config": {
                "connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",

                "key.converter": "org.apache.kafka.connect.json.JsonConverter",

                "key.converter.schemas.enable":"false",

                "value.converter": "org.apache.kafka.connect.json.JsonConverter",
                "value.converter.schemas.enable":"false",


                "connection.url": "jdbc:mysql://localhost:3306/demo?user=rmoff&password=pw",

                "table.whitelist": "CONNECTOR",

                "mode": "timestamp",

                "timestamp.column.name": "update_ts",

                "validate.non.null": "false",

                "transforms":"createKey,extractId, castString",
                "transforms.createKey.type":"org.apache.kafka.connect.transforms.ValueToKey",
                "transforms.createKey.fields":"id",
                "transforms.extractId.type":"org.apache.kafka.connect.transforms.ExtractField$Key",
                "transforms.extractId.field":"id",
                "transforms.castString.type": "org.apache.kafka.connect.transforms.Cast$Key",
                "transforms.castString.spec": "string",

                "topic.prefix": "enrichment-"
        }
}

这会完美地提取ID,但会映射到错误的分区。我也尝试了extractId而不是extractString,但是发生了同样的事情。在任何地方我都找不到清晰的文档,以了解如何准确地包含这些转换。

问题简而言之:

我需要从行中提取id字段,使其成为记录键,并确保它的行为与使用kafka生产者的行为相同

KafkaProducer.produce("string key", event)

如果我用生产者填充这两个主题,则它们最终会位于正确的分区上,但是关于connect的某些事情却映射到了不同的分区,即使它是相同的键

0 个答案:

没有答案