我们正在使用Kafka Connect HDFS连接器,该连接器不断从Kafka主题中提取数据并在HDFS上提交它们。
成功加载12 + 11小时后,我们突然在连接器端发现此错误。
org.apache.kafka.clients.consumer.NoOffsetForPartitionException: Undefined offset with no reset policy for partition:Prd_IN_GeneralEvents-39
at org.apache.kafka.clients.consumer.internals.Fetcher.resetOffset(Fetcher.java:374)at org.apache.kafka.clients.consumer.internals.Fetcher.resetOffsetsIfNeeded(Fetcher.java:227)at org.apache.kafka.clients.consumer.KafkaConsumer.updateFetchPositions(KafkaConsumer.java:1592)at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1035)at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:995)
at org.apache.kafka.connect.runtime.WorkerSinkTask.pollConsumer(WorkerSinkTask.java:360)at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:245)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:179)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:148)
at
org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:139)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:182)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:744)
然后一些(100个中的9个)HDFS工作线程被杀死,我们开始丢失数据。
此错误的根本原因是什么?
我们在connect.distributed.properties文件中设置了auto.offset.reset=latest