关于Flink中的NoResourceAvailableException的问题

时间:2019-10-28 07:37:12

标签: apache-flink

这是错误消息:

2019-10-27 05:32:57,087 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Sink: Unnamed (34/40) (95aac9e47f777ddc73c7a29cc1091911) switched from CREATED to SCHEDULED.
2019-10-27 05:32:57,087 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Sink: Unnamed (35/40) (5181fb35b0a2eab588dd7ed2eb902bbd) switched from CREATED to SCHEDULED.
2019-10-27 05:32:57,087 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Sink: Unnamed (36/40) (bf4aac9423bdecaeeb7e6ac37001d73d) switched from CREATED to SCHEDULED.
2019-10-27 05:32:57,087 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Sink: Unnamed (37/40) (31f8ee4d7adbcfd5de21b4cbb83c5e05) switched from CREATED to SCHEDULED.
2019-10-27 05:32:57,087 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Sink: Unnamed (38/40) (8ba11f69e8e5ee2aacaa276136ad3bd0) switched from CREATED to SCHEDULED.
2019-10-27 05:32:57,087 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Sink: Unnamed (39/40) (1a1e38ede6b8d398b50b8fe7de2c6cb2) switched from CREATED to SCHEDULED.
2019-10-27 05:32:57,087 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Sink: Unnamed (40/40) (7fbb095da45b2d2392874fe4fa5c916d) switched from CREATED to SCHEDULED.
2019-10-27 05:37:57,088 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job Flink Streaming Job (4e5011eb97e695cfb2d05048534b097a) switched from state RUNNING to FAILING.
2019-10-27 05:37:57,088 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job Flink Streaming Job (4e5011eb97e695cfb2d05048534b097a) switched from state RUNNING to FAILING.
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Could not allocate all requires slots within timeout of 300000 ms. Slots required: 152, slots allocated: 150, previous allocation IDs: []

我的并行设置:

source : 32
flatmap : 80
sink : 40

jobManager是否试图从resourceManager请求152个插槽,但是rm没有足够的插槽,最终导致失败。插槽不再可用时,resourceManager是否无法从其他taskmanagers获得更多插槽?

1 个答案:

答案 0 :(得分:0)

可用插槽数为numberOfTaskmanagers x taskmanager.numberOfTaskSlots(例如,具有2个插槽的75个任务管理器产生150个插槽)。 Flink本身无法触发任何类型的动态缩放。您所能做的就是手动启动更多任务管理器或更改任务管理器配置并重新启动任务管理器。

如果任务管理器在作业运行时死亡,则可以定义重启策略(请记住,您需要为此启用检查点): https://ci.apache.org/projects/flink/flink-docs-stable/dev/task_failure_recovery.html#restart-strategies

如果您的任务管理器死了并且没有重新启动,很可能是毛线问题。