我正在运行一个具有7个节点和大量流处理功能的Kafka集群。现在,我在Kafka Streams应用程序中看到了很少出现的错误,例如在高输入速率下:
[2018-07-23 14:44:24,351] ERROR task [0_5] Error sending record to topic topic-name. No more offsets will be recorded for this task and the exception will eventually be thrown (org.apache.kafka.streams.processor.internals.RecordCollectorImpl) org.apache.kafka.common.errors.TimeoutException: Expiring 13 record(s) for topic-name-3: 60060 ms has passed since last append
[2018-07-23 14:44:31,021] ERROR stream-thread [StreamThread-2] Failed to commit StreamTask 0_5 state: (org.apache.kafka.streams.processor.internals.StreamThread) org.apache.kafka.streams.errors.StreamsException: task [0_5] exception caught when producing at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.checkForException(RecordCollectorImpl.java:121) at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.flush(RecordCollectorImpl.java:129) at org.apache.kafka.streams.processor.internals.StreamTask$1.run(StreamTask.java:76) at org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:188) at org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:281) at org.apache.kafka.streams.processor.internals.StreamThread.commitOne(StreamThread.java:807) at org.apache.kafka.streams.processor.internals.StreamThread.commitAll(StreamThread.java:794) at org.apache.kafka.streams.processor.internals.StreamThread.maybeCommit(StreamThread.java:769) at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:647) at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:361) Caused by: org.apache.kafka.common.errors.TimeoutException: Expiring 13 record(s) for topic-name-3: 60060 ms has passed since last append
[2018-07-23 14:44:31,033] ERROR stream-thread [StreamThread-2] Failed while executing StreamTask 0_5 due to flush state: (org.apache.kafka.streams.processor.internals.StreamThread) org.apache.kafka.streams.errors.StreamsException: task [0_5] exception caught when producing at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.checkForException(RecordCollectorImpl.java:121) at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.flush(RecordCollectorImpl.java:129) at org.apache.kafka.streams.processor.internals.StreamTask.flushState(StreamTask.java:423) at org.apache.kafka.streams.processor.internals.StreamThread$4.apply(StreamThread.java:555) at org.apache.kafka.streams.processor.internals.StreamThread.performOnTasks(StreamThread.java:501) at org.apache.kafka.streams.processor.internals.StreamThread.flushAllState(StreamThread.java:551) at org.apache.kafka.streams.processor.internals.StreamThread.shutdownTasksAndState(StreamThread.java:449) at org.apache.kafka.streams.processor.internals.StreamThread.shutdown(StreamThread.java:391) at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:372) Caused by: org.apache.kafka.common.errors.TimeoutException: Expiring 13 record(s) for topic-name-3: 60060 ms has passed since last append
[2018-07-23 14:44:31,039] WARN stream-thread [StreamThread-2] Unexpected state transition from RUNNING to NOT_RUNNING. (org.apache.kafka.streams.processor.internals.StreamThread) Exception in thread "StreamThread-2" org.apache.kafka.streams.errors.StreamsException: task [0_5] exception caught when producing at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.checkForException(RecordCollectorImpl.java:121) at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.flush(RecordCollectorImpl.java:129) at org.apache.kafka.streams.processor.internals.StreamTask$1.run(StreamTask.java:76) at org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:188) at org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:281) at org.apache.kafka.streams.processor.internals.StreamThread.commitOne(StreamThread.java:807) at org.apache.kafka.streams.processor.internals.StreamThread.commitAll(StreamThread.java:794) at org.apache.kafka.streams.processor.internals.StreamThread.maybeCommit(StreamThread.java:769) at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:647) at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:361) Caused by: org.apache.kafka.common.errors.TimeoutException: Expiring 13 record(s) for topic-name-3: 60060 ms has passed since last append
如果我降低输入速率(从20k事件/ s到10k事件/ s),错误将消失。所以很明显我正在达到任何极限。我玩过不同的选项(request.timeout.ms,linger.ms和batch.size),但是每次都得到相同的结果。
答案 0 :(得分:0)
您似乎已达到某种极限。根据消息0631/com.stunntech.fluttertodoapp E/flutter: [ERROR:topaz/lib/tonic/logging/dart_error.cc(16)] Unhandled exception:
type '(Database, int) => void' is not a subtype of type '(Database, int) => Future<dynamic>'
#0 DatabaseHelper.initDb (package:flutter_todo_app/database/database.dart:29:64)
<asynchronous suspension>
#1 DatabaseHelper.db (package:flutter_todo_app/database/database.dart:20:17)
<asynchronous suspension>
#2 DatabaseHelper.saveTodo (package:flutter_todo_app/database/database.dart:40:26)
<asynchronous suspension>
#3 _MyTodoListState._submitTodo (package:flutter_todo_app/todo_list.dart:144:30)
<asynchronous suspension>
#4 _MyTodoListState._showAlert.<anonymous closure> (package:flutter_todo_app/todo_list.dart:97:19)
#5 GestureRecognizer.invokeCallback (package:flutter/src/gestures/recognizer.dart:102:24)
#6 TapGestureRecognizer._checkUp (package:flutter/src/gestures/tap.dart:161:9)
#7 TapGestureRecognizer.handlePrimaryPointer (package:flutter/src/gestures/tap.dart:94:7)
#8 PrimaryPointerGestureRecognizer.handleEvent (package:flutter/src/gestures/recognizer.dart:315:9)
#9 PointerRouter._dispatch (package:flutter/src/gestures/pointer_router.dart:73:12)
#10 PointerRouter.route (package:flutter/src/gestures/pointer_router.dart:101:11)
#11 _WidgetsFlutterBinding&BindingBase&GestureBinding.handleEvent (package:flutter/src/gestures/binding.dart:143:19)
#12 _WidgetsFlutterBinding&BindingBase&GestureBinding.dispatchEvent (package:flutter/src/gestures/binding.dart:121:22)
#13 _WidgetsFlutterBinding&BindingBase&GestureBinding._handlePointerEvent (package:flutter/src/gestures/binding.dart:101:7)
#14 _WidgetsFlutterBinding&BindingBase&GestureBinding._flushPointerEventQueue (package:flutter/src/gestures/binding.dart:64:7)
#15 _WidgetsFlutterBinding&BindingBase&GestureBinding._handlePointerDataPacket (package:flutter/src/gestures/binding.dart:48:7)
#16 _invoke1 (dart:ui/hooks.dart:134:13)
#17 _dispatchPointerDataPacket (dart:ui/hooks.dart:91:5)enter code here
,我认为这是由于高负载而使线程处于饥饿状态,因此磁盘将是首先要检查的东西:
答案 1 :(得分:0)
我们有类似的问题。 在我们的案例中,我们具有以下用于复制和确认的配置:
replication.factor: 3
producer.acks: all
在高负载下,同一错误多次发生TimeoutException: Expiring N record(s) for topic: N ms has passed since last append
。
在删除我们的自定义replication.factor
和producer.acks
配置(因此我们现在使用默认值)后,此错误消失了。
无疑,在生产者端要花费更多的时间,直到领导者将收到完整的同步副本以确认记录,直到用指定的replication.factor
复制记录为止。
使用默认值对您的容错能力的保护会稍差一些。
还可能考虑增加每个主题的分区数量和应用程序节点的数量(您的kafka流逻辑在其中处理)。