我在Confluent社区平台中使用Kafka Connect来保持MySQL数据库同步。源和接收器是MySQL数据库。没用
我遇到的一些问题:
同一服务器上其他数据库中有表,我不想将它们读入Kafka,但是Kafka Connect Source一直试图读取其他数据库。
我想在源连接器和接收器连接器中都使用org.apache.kafka.connect.json.JsonConverter
,但是接收器连接器无法正确插入。
我要同步多个数据库,不同数据库中的表可能具有相同的表名,如何避免表名冲突和接收器连接器 正确地路由Kafka主题以将数据插入正确的数据库? MySQL Synchronization illustration
Kafka JDBC源连接器配置文件为:
{
"name": "br-auths-3910472223-source",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable":"true",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable":"true",
"tasks.max": "1",
"connection.url": "jdbc:mysql://localhost:3306/br_auths?user=root&password=123456",
"database.whitelist":"br_auths",
"table.blacklist": "br_auths.__migrationversions,br_auths.auths_service_apps",
"mode": "timestamp",
"timestamp.column.name": "utime",
"validate.non.null": "false",
"incrementing.column.name": "id",
"topic.prefix": "br_auths__"
}
}
Kafka JDBC Sink连接器配置文件为:
{
"name": "br-auths-3910472223-sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable":"true",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable":"true",
"tasks.max": "1",
"connection.url": "jdbc:mysql://rm-hp303a0n2vr8970.mysql.huhehaote.rds.aliyuncs.com:3306/dev-br-auths-391047222?user=br_auths&password=@123456",
"topics": "br_auths__auths_roles,br_auths__auths_user_logins,br_auths__auths_user_roles,br_auths__auths_users,br_auths__auths_user_claims,br_auths__auths_user_tokens,br_auths__auths_role_claims",
"auto.create": "true",
"insert.mode": "upsert",
"transforms":"dropTopicPrefix",
"transforms.dropTopicPrefix.type":"org.apache.kafka.connect.transforms.RegexRouter",
"transforms.dropTopicPrefix.regex":"br_auths__(.*)",
"transforms.dropTopicPrefix.replacement":"$1"
}
}
我想为不同的数据库创建几对源连接器和接收器连接器,MySQL服务器A中的数据库A中的一些白名单表可以与MySQL服务器B中的数据库A逐步同步。
更新1:
我改为使用分布式连接的Debezium源连接器和JDBC Sink连接器。源连接器是:
{
"name":"br-auths-3910472223-source",
"config":{
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"tasks.max": "1",
"database.hostname": "localhost",
"database.port": "3306",
"database.user": "root",
"database.password": "br123456",
"database.useLegacyDatetimeCode": "false",
"database.server.id": "184",
"database.server.name": "local3910472223",
"database.whitelist":"br_auths",
"database.history.kafka.bootstrap.servers": "localhost:9092",
"database.history.kafka.topic": "schema-changes.br-auths.local3910472223" ,
"table.blacklist": "br_auths.__migrationversions,br_auths.auths_service_apps",
"include.schema.changes": "true",
"transforms": "route,TimestampConverter",
"transforms.TimestampConverter.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
"transforms.TimestampConverter.target.type": "string",
"transforms.TimestampConverter.field": "payload.after.ctime",
"transforms.TimestampConverter.format": "yyyy-MM-dd HH:mm:ss",
"transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.route.regex": "([^.]+)\\.([^.]+)\\.([^.]+)",
"transforms.route.replacement": "$2__$3"
}
}
接收器连接器为:
{
"name": "br-auths-3910472223-sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"connection.url": "jdbc:mysql://rm-hp303a0n2.mysql.huhehaote.rds.aliyuncs.com:3306/dev-br-auths-391047222?useLegacyDatetimeCode=false&user=br_auths&password=123456",
"dialect.name": "MySqlDatabaseDialect",
"topics.regex": "br_auths__(.*)",
"transforms": "dropTopicPrefix,unwrap",
"transforms.dropTopicPrefix.type":"org.apache.kafka.connect.transforms.RegexRouter",
"transforms.dropTopicPrefix.regex":"br_auths__(.*)",
"transforms.dropTopicPrefix.replacement":"$1",
"transforms.unwrap.type": "io.debezium.transforms.UnwrapFromEnvelope",
"insert.mode": "upsert",
"pk.fields": "Id",
"pk.mode": "record_value"
}
}
Avro消息像这样转换为json:
{
"schema": {
"type": "struct",
"fields": [
{
"type": "struct",
"fields": [
{
"type": "string",
"optional": false,
"field": "Id"
},
{
"type": "string",
"optional": false,
"field": "UserId"
},
{
"type": "string",
"optional": false,
"field": "RoleId"
},
{
"type": "string",
"optional": true,
"field": "APPID"
},
{
"type": "int32",
"optional": false,
"default": 0,
"field": "IsDeleted"
},
{
"type": "int64",
"optional": false,
"name": "io.debezium.time.Timestamp",
"version": 1,
"default": 0,
"field": "ctime"
},
{
"type": "int64",
"optional": false,
"name": "io.debezium.time.Timestamp",
"version": 1,
"default": 0,
"field": "utime"
}
],
"optional": true,
"name": "local3910472223.br_auths.auths_user_roles.Value",
"field": "before"
},
{
"type": "struct",
"fields": [
{
"type": "string",
"optional": false,
"field": "Id"
},
{
"type": "string",
"optional": false,
"field": "UserId"
},
{
"type": "string",
"optional": false,
"field": "RoleId"
},
{
"type": "string",
"optional": true,
"field": "APPID"
},
{
"type": "int32",
"optional": false,
"default": 0,
"field": "IsDeleted"
},
{
"type": "int64",
"optional": false,
"name": "io.debezium.time.Timestamp",
"version": 1,
"default": 0,
"field": "ctime"
},
{
"type": "int64",
"optional": false,
"name": "io.debezium.time.Timestamp",
"version": 1,
"default": 0,
"field": "utime"
}
],
"optional": true,
"name": "local3910472223.br_auths.auths_user_roles.Value",
"field": "after"
},
{
"type": "struct",
"fields": [
{
"type": "string",
"optional": true,
"field": "version"
},
{
"type": "string",
"optional": false,
"field": "name"
},
{
"type": "int64",
"optional": false,
"field": "server_id"
},
{
"type": "int64",
"optional": false,
"field": "ts_sec"
},
{
"type": "string",
"optional": true,
"field": "gtid"
},
{
"type": "string",
"optional": false,
"field": "file"
},
{
"type": "int64",
"optional": false,
"field": "pos"
},
{
"type": "int32",
"optional": false,
"field": "row"
},
{
"type": "boolean",
"optional": true,
"default": false,
"field": "snapshot"
},
{
"type": "int64",
"optional": true,
"field": "thread"
},
{
"type": "string",
"optional": true,
"field": "db"
},
{
"type": "string",
"optional": true,
"field": "table"
},
{
"type": "string",
"optional": true,
"field": "query"
}
],
"optional": false,
"name": "io.debezium.connector.mysql.Source",
"field": "source"
},
{
"type": "string",
"optional": false,
"field": "op"
},
{
"type": "int64",
"optional": true,
"field": "ts_ms"
}
],
"optional": false,
"name": "local3910472223.br_auths.auths_user_roles.Envelope"
},
"payload": {
"before": null,
"after": {
"Id": "DB4DA841364860D112C3C76BDCB36635",
"UserId": "0000000000",
"RoleId": "5b7e5f9b4bc00d89c4cf96ae",
"APPID": "br.region2",
"IsDeleted": 0,
"ctime": 1550138524000,
"utime": 1550138524000
},
"source": {
"version": "0.8.3.Final",
"name": "local3910472223",
"server_id": 0,
"ts_sec": 0,
"gtid": null,
"file": "mysql-bin.000003",
"pos": 64606,
"row": 0,
"snapshot": true,
"thread": null,
"db": "br_auths",
"table": "auths_user_roles",
"query": null
},
"op": "c",
"ts_ms": 1550568556614
}
}
使用MySQL datetime类型的列被序列化为一个大整数,JDBC接收器连接器试图插入MySQL datetime列,但失败。
所以我在源连接配置中写了transforms.TimestampConverter
,但是ctime,utime列没有变化。怎么了?
答案 0 :(得分:0)
针对您的具体问题。 This article将解决您的许多问题。特别是:
同一服务器上其他数据库中有表,我不想将它们读入Kafka,但是Kafka Connect Source一直试图读取其他数据库。
根据需要使用table.whitelist
,table.blacklist
和schema.pattern
的组合。如果无法用一个连接器匹配整个图案,则需要使用多个连接器来获得所需的设置。
我想在Source Connector和Sink Connector中都使用org.apache.kafka.connect.json.JsonConverter,但是接收器连接器无法正确插入。
没有您解释“无法正确插入”的情况,很难回答这个问题。通常,我会使用Avro,因为它具有更丰富的架构支持和更有效的消息(没有嵌入式架构,架构存储在架构注册表中)。有关更多详细信息,请参见here。
我想同步多个数据库,不同数据库中的表可能具有相同的表名,如何避免表名冲突以及接收器连接器正确路由Kafka主题以将数据插入正确的数据库中?
您将需要使用“源”连接器上的topic.prefix
组合来标记来自特定源的主题,然后使用“单消息转换RegexRouter
”(如您所知)来操作主题在源连接器和/或接收器连接器中进一步命名。您可能需要使用topics.regex
的多个接收器连接器来选择特定主题以路由到特定数据库。