我们在源端使用此配置:
{"name": "mongo-source-1",
"config": {
"connector.class":"com.mongodb.kafka.connect.MongoSourceConnector",
"tasks.max":"3",
"connection.uri":"mongodb://mongo1:27017,mongo2:27017,mongo3:27017",
"topic.prefix":"mongo",
"database":"test",
"collection":"investigate1",
"change.stream.full.document": "updateLookup",
"key.converter":"org.apache.kafka.connect.storage.StringConverter",
"key.converter.schemas.enable":"false",
"value.converter":"org.apache.kafka.connect.storage.StringConverter",
"value.converter.schemas.enable":"false"
}}
但是它生成的数据看起来像这样:
"somefield": {
"$numberLong": "2342423432432432434324"
}
然后,当我们将其下沉到s3时,我们将无法运行Athena查询,因为它在$处中断。
我们如何从该源连接器生成常规json,以免出现问题?
为清楚起见,我们只希望它看起来像这样:
"somefield": 2342423432432432434324"
mongodb kafka官方连接器指南无济于事,甚至没有讨论key.converter和value.converter参数。
也许在水槽那边有一个选项可以改变这种情况?
这是我们的接收器配置:
{
“name”: “s3-sink-1’“,
“config”: {
“connector.class”:“io.confluent.connect.s3.S3SinkConnector”,
“tasks.max”:“3”,
“topics.dir”:“topics”,
“format.class”: “io.confluent.connect.s3.format.json.JsonFormat”,
“topics”:“mongo.test.investigate1’“,
“s3.bucket.name”:“superawesomebucketname",
“s3.region”:“us-east-2",
“s3.part.size”:“5242880",
“flush.size”:“1000",
“rotate.schedule.interval.ms”:“5000",
“timezone”: “UTC”,
“key.converter”:“org.apache.kafka.connect.json.JsonConverter”,
“key.converter.schemas.enable”:“false”,
“value.converter”:“org.apache.kafka.connect.json.JsonConverter”,
“value.converter.schemas.enable”:“false”,
“storage.class”:“io.confluent.connect.s3.storage.S3Storage”
}}