从Kafka向MongDB传输数据时如何为非json格式的消息配置接收器连接器

时间:2019-07-09 08:52:34

标签: mongodb apache-kafka apache-kafka-connect

我可以将数据从Kafka流式传输到MongoDB。

我通过Uncaught Error: dia.ElementView: markup required error用json消息流式传输(例如:const cells = JSON.stringify(graph.toJSON()); graph.clear(); graph.fromJSON(JSON.parse(cells)); )。它已成功插入MongoDB。

但是,如果我流式传输一条消息(例如:kafka-console-producer.sh)。然后,由于json解析,我将得到错误

{"id": 1, "test": 123}

接收器连接器配置文件是abc,其中包含内容(我根据此source创建它)

[2019-07-09 14:44:30,365] ERROR WorkerSinkTask{id=mongo-sink-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask)
org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler
        at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:178)
        at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:487)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:464)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:320)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:224)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:192)
        at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
        at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kafka.connect.errors.DataException: Converting byte[] to Kafka Connect data failed due to serialization error:
        at org.apache.kafka.connect.json.JsonConverter.toConnectData(JsonConverter.java:344)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$1(WorkerSinkTask.java:487)
        at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)
        at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
        ... 13 more
Caused by: org.apache.kafka.common.errors.SerializationException: com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'abc': was expecting ('true', 'false
' or 'null')
 at [Source: (byte[])"abc"; line: 1, column: 7]
Caused by: com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'abc': was expecting ('true', 'false' or 'null')
 at [Source: (byte[])"abc"; line: 1, column: 7]
        at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1804)
        at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:703)
        at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidToken(UTF8StreamJsonParser.java:3532)
        at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._handleUnexpectedValue(UTF8StreamJsonParser.java:2627)
        at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._nextTokenNotInObject(UTF8StreamJsonParser.java:832)
        at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:729)
        at com.fasterxml.jackson.databind.ObjectMapper._readTreeAndClose(ObjectMapper.java:4042)
        at com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:2571)
        at org.apache.kafka.connect.json.JsonDeserializer.deserialize(JsonDeserializer.java:50)
        at org.apache.kafka.connect.json.JsonConverter.toConnectData(JsonConverter.java:342)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$1(WorkerSinkTask.java:487)
        at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)
        at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
        at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:487)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:464)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:320)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:224)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:192)
        at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
        at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
[2019-07-09 14:44:30,374] ERROR WorkerSinkTask{id=mongo-sink-0} Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.Wor
kerTask)

我有2个独立的问题:

  1. 如何忽略非json格式的消息?

  2. 如何为此类消息定义默认密钥(例如:MongoSinkConnector.properties-> name=mongo-sink topics=test connector.class=com.mongodb.kafka.connect.MongoSinkConnector tasks.max=1 key.ignore=true # Specific global MongoDB Sink Connector configuration connection.uri=mongodb://localhost:27017 database=test_kafka collection=transaction max.num.retries=3 retries.defer.timeout=5000 type.name=kafka-connect key.converter=org.apache.kafka.connect.json.JsonConverter key.converter.schemas.enable=false value.converter=org.apache.kafka.connect.json.JsonConverter value.converter.schemas.enable=false

1 个答案:

答案 0 :(得分:0)

字符串是“ JSON格式”,但是,JsonConverter通常期望使用JSON 对象,或者在某些情况下期望使用null,而不是纯字符串。

没有为错误设置默认值的属性,但是您可以简单地跳过事件,或将它们发送到另一个主题,如果您希望它们也出现在数据库中,则必须分别处理和修复消息。 。这是通过errors.tolerance设置配置的(仅自Kafka Connect 2.0起可用)

引荐-https://www.confluent.io/blog/kafka-connect-deep-dive-error-handling-dead-letter-queues

如果数据只是"abc",那么我建议还是使用StringConverter而不是JsonConverter

但是,如果要将 all 事件包装在JSON对象中(而不仅仅是纯字符串),则必须添加transforms属性

transforms=MakeMap 
transforms.MakeMap.type=org.apache.kafka.connect.transforms.HoistField$Value 
transforms.MakeMap.field=non-json 

这将需要"abc"{"non-json":"abc"} 以及 {"id": 1, "test": 123}{"non-json":{"id": 1, "test": 123}},因此您需要注意所产生的内容进入主题。