将数据下沉到S3时,MongoDB Kafka Connect连接器的正确配置

时间:2020-04-09 15:19:11

标签: mongodb amazon-s3 apache-kafka apache-kafka-connect

我们在源端使用此配置:

{"name": "mongo-source-1",
      "config": {
        "connector.class":"com.mongodb.kafka.connect.MongoSourceConnector",
        "tasks.max":"3",  
        "connection.uri":"mongodb://mongo1:27017,mongo2:27017,mongo3:27017",
        "topic.prefix":"mongo",
        "database":"test",
        "collection":"investigate1",
        "change.stream.full.document": "updateLookup",
        "key.converter":"org.apache.kafka.connect.storage.StringConverter",  
        "key.converter.schemas.enable":"false",
        "value.converter":"org.apache.kafka.connect.storage.StringConverter",
        "value.converter.schemas.enable":"false"
    }}

但是它生成的数据看起来像这样:

"somefield": {
      "$numberLong": "2342423432432432434324"
    }

然后,当我们将其下沉到s3时,我们将无法运行Athena查询,因为它在$处中断。

我们如何从该源连接器生成常规json,以免出现问题?

为清楚起见,我们只希望它看起来像这样:

"somefield": 2342423432432432434324"

mongodb kafka官方连接器指南无济于事,甚至没有讨论key.converter和value.converter参数。

也许在水槽那边有一个选项可以改变这种情况?

这是我们的接收器配置:

{
  “name”: “s3-sink-1’“,
  “config”: {
  “connector.class”:“io.confluent.connect.s3.S3SinkConnector”,
  “tasks.max”:“3”,
  “topics.dir”:“topics”,
  “format.class”: “io.confluent.connect.s3.format.json.JsonFormat”,
  “topics”:“mongo.test.investigate1’“,
  “s3.bucket.name”:“superawesomebucketname",
  “s3.region”:“us-east-2",
  “s3.part.size”:“5242880",
  “flush.size”:“1000",
  “rotate.schedule.interval.ms”:“5000",
  “timezone”: “UTC”,
  “key.converter”:“org.apache.kafka.connect.json.JsonConverter”,
  “key.converter.schemas.enable”:“false”,
  “value.converter”:“org.apache.kafka.connect.json.JsonConverter”,
  “value.converter.schemas.enable”:“false”,
  “storage.class”:“io.confluent.connect.s3.storage.S3Storage”
}}

0 个答案:

没有答案