我在MySQL中有一个简单的数据库表,其中有(id varchar(255), val varchar(255), ..., ...)
。
我设置了Kafka Connect,以将表流式传输到具有二十个分区的主题(CONNECT_TOPIC)。我还有一个主题(STREAM_TOPIC),由具有20个分区的kafka生产者填充。
问题是连接器上的键映射到CONNECTOR_TOPIC中的分区,而不是STREAM_TOPIC中的键。这意味着我不能同时加入两个主题。我相信这是因为ID提取错误。
以下是该输出的示例:
Stream Task ID 0_13 Partition number 13 Consumed CONNECT_EVENT 68f52084-cfc9-4997-a28e-57cfd4f7bbbf
Stream Task ID 0_13 Partition number 13 Consumed JOINED CONNECT_EVENT 68f52084-cfc9-4997-a28e-57cfd4f7bbbf
Stream Task ID 0_17 Partition number 17 Consumed STREAM_EVENT 68f52084-cfc9-4997-a28e-57cfd4f7bbbf
Stream Task ID 0_17 Partition number 17 Consumed JOINED STREAM_EVENT 68f52084-cfc9-4997-a28e-57cfd4f7bbbf
Stream Task ID 0_7 Partition number 7 Consumed STREAM_EVENT 32aaa88d-b175-4a54-8338-d542ed051e6a
Stream Task ID 0_7 Partition number 7 Consumed JOINED STREAM_EVENT 32aaa88d-b175-4a54-8338-d542ed051e6a
Stream Task ID 0_17 Partition number 17 Consumed CONNECT_EVENT 32aaa88d-b175-4a54-8338-d542ed051e6a
Stream Task ID 0_17 Partition number 17 Consumed JOINED CONNECT_EVENT 32aaa88d-b175-4a54-8338-d542ed051e6a
Stream Task ID 0_11 Partition number 11 Consumed CONNECT_EVENT 90265a93-adac-4e93-856c-d1498eeeb22e
Stream Task ID 0_11 Partition number 11 Consumed JOINED CONNECT_EVENT 90265a93-adac-4e93-856c-d1498eeeb22e
Stream Task ID 0_11 Partition number 11 Consumed STREAM_EVENT 90265a93-adac-4e93-856c-d1498eeeb22e
Stream Task ID 0_11 Partition number 11 Consumed JOINED STREAM_EVENT 90265a93-adac-4e93-856c-d1498eeeb22e
Stream Task ID 0_11 Partition number 11 Merged 90265a93-adac-4e93-856c-d1498eeeb22e
我尝试了以下连接器配置来转换ID:
"name": "CONNECTOR",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable":"false",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable":"false",
"connection.url": "jdbc:mysql://localhost:3306/demo?user=rmoff&password=pw",
"table.whitelist": "CONNECTOR",
"mode": "timestamp",
"timestamp.column.name": "update_ts",
"validate.non.null": "false",
"transforms":"createKey,extractId, castString",
"transforms.createKey.type":"org.apache.kafka.connect.transforms.ValueToKey",
"transforms.createKey.fields":"id",
"transforms.extractId.type":"org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.extractId.field":"id",
"transforms.castString.type": "org.apache.kafka.connect.transforms.Cast$Key",
"transforms.castString.spec": "string",
"topic.prefix": "enrichment-"
}
}
这会完美地提取ID,但会映射到错误的分区。我也尝试了extractId而不是extractString,但是发生了同样的事情。在任何地方我都找不到清晰的文档,以了解如何准确地包含这些转换。
问题简而言之:
我需要从行中提取id字段,使其成为记录键,并确保它的行为与使用kafka生产者的行为相同
KafkaProducer.produce("string key", event)
如果我用生产者填充这两个主题,则它们最终会位于正确的分区上,但是关于connect的某些事情却映射到了不同的分区,即使它是相同的键