我正在以独立模式(即JDBC连接器插件v5.2.1)运行Kafka Connect v2.12。
我正在使用JSON序列化,并将模式嵌入到有效负载中。
由于JDBC连接器在设计上是快速失败的,因此如果进入FAILED
状态,并且如果收到的消息有问题,则停止处理消息。
如果JDBC连接器仅处理最新消息,则可以。错误的消息会使它崩溃,然后可以重新启动,并在下一条(希望是结构良好的)消息中接收。
但是,我的JDBC连接器已经启动*,并且在启动时读取所有历史消息。我在启动日志中注意到auto.offset.reset
被设置为earliest
。这很奇怪,因为默认值为latest
,并且在我的consumer.auto.offset.reset
文件中未将worker.properties
设置为earliest
,latest
或其他设置。无论如何,我编辑了worker.properties
文件,以将consumer.auto.offset.reset
设置为latest
,如下所示。
从启动日志现在显示auto.offset.reset=latest
的角度来看,此更改是成功的,但是连接器在每次启动时都将崩溃,因为它尝试处理格式不正确的JSON已有一周的消息。>
我应该修改哪些设置以使我的Kafka Connect工作者仅提取自该工作者启动以来发送的Kafka消息?
* 直到上周,连接器仅读取最新消息。 IDK是我弄乱了配置中的某些内容,还是其他人更改了Kafka代理的全局设置,但自上周以来,它一直在每次启动时从最早的消息开始读取所有消息。
worker.properties
# This file was based from https://github.com/boundary/dropwizard-kafka/blob/master/kafka-connect/src/main/resources/kafka-connect/example.standalone.worker.properties
offset.storage.file.filename=/tmp/example.offsets
bootstrap.servers=kafka-0.kafka-headless.kafka:9092,kafka-1.kafka-headless.kafka:9092,kafka-2.kafka-headless.kafka:9092
offset.flush.interval.ms=10000
rest.port=8083
rest.advertised.port=8083
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
value.converter.schemas.enable=true
# Prevent the connector from pulling all historical messages
consumer.auto.offset.reset=latest
有关信息,这是历史消息的列表。猜猜是哪个消息触发了错误,哈哈。
{"schema":{"type":"struct","fields":[{"type":"int64","optional":false,"field":"id"},{"type":"string","optional":false,"field":"status"}],"optional":false,"name":"example_topic"},"payload":{"id":1337,"status":"success"}}
{"schema":{"type":"struct","fields":[{"type":"int64","optional":false,"field":"id"},{"type":"string","optional":false,"field":"status"}],"optional":false,"name":"example_topic"},"payload":{"id":1337,"status":"success"}}
{"schema":{"type":"struct","fields":[{"type":"int64","optional":false,"field":"id"},{"type":"string","optional":false,"field":"status"}],"optional":false,"name":"example_topic"},"payload":{"id":1337,"status":"success"}}
{"schema":{"type":"struct","fields":[{"type":"int64","optional":false,"field":"id"},{"type":"string","optional":false,"field":"status"}],"optional":false,"name":"example_topic"},"payload":{"id":1337,"status":"success"}}
kafka_connect/bin/kafka-console-producer.sh \
--broker-list kafka-0.kafka-headless.kafka:9092,kafka-1.kafka-headless.kafka:9092,kafka-2.kafka-headless.kafka:9092 \
--topic example_topic
{"schema":{"type":"struct","fields":[{"type":"int64","optional":false,"field":"id"},{"type":"string","optional":false,"field":"status"}],"optional":false,"name":"example_topic"},"payload":{"id":1337,"status":"success"}}
以及启动时出现的日志和错误消息:
[2019-07-30 14:20:55,020] INFO Initializing writer using SQL dialect: PostgreSqlDatabaseDialect (io.confluent.connect.jdbc.sink.JdbcSinkTask:57)
[2019-07-30 14:20:55,021] INFO WorkerSinkTask{id=postgres_sink-0} Sink task finished initialization and start (org.apache.kafka.connect.runtime.WorkerSinkTask:302)
[2019-07-30 14:20:55,031] INFO Cluster ID: DPpxrPbVR5qwiI9vz_Gkkw (org.apache.kafka.clients.Metadata:273)
[2019-07-30 14:20:55,031] INFO [Consumer clientId=consumer-1, groupId=connect-postgres_sink] Discovered group coordinator kafka-2.kafka-headless.kafka:9092 (id: 2147483645 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:677)
[2019-07-30 14:20:55,033] INFO [Consumer clientId=consumer-1, groupId=connect-postgres_sink] Revoking previously assigned partitions [] (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:462)
[2019-07-30 14:20:55,033] INFO [Consumer clientId=consumer-1, groupId=connect-postgres_sink] (Re-)joining group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:509)
[2019-07-30 14:20:58,046] INFO [Consumer clientId=consumer-1, groupId=connect-postgres_sink] Successfully joined group with generation 884 (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:473)
[2019-07-30 14:20:58,048] INFO [Consumer clientId=consumer-1, groupId=connect-postgres_sink] Setting newly assigned partitions [example_topic-0] (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:280)
[2019-07-30 14:20:58,097] ERROR WorkerSinkTask{id=postgres_sink-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:177)
org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:178)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:513)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:490)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:321)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:225)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:193)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.apache.kafka.connect.errors.DataException: Converting byte[] to Kafka Connect data failed due to serialization error:
at org.apache.kafka.connect.json.JsonConverter.toConnectData(JsonConverter.java:334)
at org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$1(WorkerSinkTask.java:513)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
... 13 more
Caused by: org.apache.kafka.common.errors.SerializationException: com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'kafka_connect': was expecting ('true', 'false' or 'null')
at [Source: (byte[])"kafka_connect/bin/kafka-console-producer.sh \"; line: 1, column: 15]
Caused by: com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'kafka_connect': was expecting ('true', 'false' or 'null')
at [Source: (byte[])"kafka_connect/bin/kafka-console-producer.sh \"; line: 1, column: 15]
at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1804)
at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:703)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidToken(UTF8StreamJsonParser.java:3532)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._handleUnexpectedValue(UTF8StreamJsonParser.java:2627)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._nextTokenNotInObject(UTF8StreamJsonParser.java:832)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:729)
at com.fasterxml.jackson.databind.ObjectMapper._readTreeAndClose(ObjectMapper.java:4042)
at com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:2571)
at org.apache.kafka.connect.json.JsonDeserializer.deserialize(JsonDeserializer.java:50)
at org.apache.kafka.connect.json.JsonConverter.toConnectData(JsonConverter.java:332)
at org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$1(WorkerSinkTask.java:513)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:513)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:490)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:321)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:225)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:193)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
答案 0 :(得分:1)
我相信您面临的问题是,设置“ consumer.auto.offset.reset”仅适用于没有存储偏移量的使用者组。
例如,假设您有一个第一次启动的消费者组。它查找已存储的偏移量,但找不到它们,因此它查看“ consumer.auto.offset.reset”设置。对于此示例,假设它已设置为“最早”,因此使用者从日志的开头开始,处理一些消息并提交偏移量(标准使用者操作)。接下来,您决定不希望这样做,因此设置“ consumer.auto.offset.reset = latest”并重新启动。消费者组再次查找其偏移量,这次是找到它的,因为它先前已提交了偏移量,因此它不会查看偏移量设置(您确实已将其设置为“最新”,但是由于存在已提交的偏移量,因此该设置将被忽略)。
可能由于某种原因,您的原始消费者使用的是“最早的”,而现在您已经为消费者组提交了抵消额,所以您不能在最晚开始。
如果要解决此问题,可以更改使用者组的名称(我不确定KafkaConnect是否公开了此名称),也可以使用Kafka随附的kafka-consumer-groups.sh脚本将偏移量设置为“最新”。
希望这会有所帮助。
答案 1 :(得分:0)
Kafka Connect处理消息包含三个阶段(接收器连接器):
Converter
)Transformation
)如果消息无效(Converter
无法正确转换),则可以设置属性以跳过该消息。
"errors.tolerance": "all",
"errors.log.enable":true,
"errors.log.include.messages":true
根据Kafka Connect配置:
errors.tolerance:
在连接器操作期间容忍错误的行为。 “无”是 预设值,并表示任何错误都会导致 立即连接器任务失败; “全部”将行为更改为跳过 有问题的记录。
errors.log.enable:
如果为true,请写下每个错误以及失败操作的详细信息,以及 有问题的记录到Connect应用程序日志中。这是“假” 默认设置,以便仅报告不容许的错误。
errors.log.include.messages:
是否在日志中包含导致 失败。默认情况下为“ false”,这将阻止记录键, 值,以及将其写入日志文件的标头,尽管有些 诸如主题和分区号之类的信息仍将被记录。