Flink-从保存点(检查点)恢复失败。由java.lang.IllegalStateException引起:该状态没有运算符

时间:2019-02-06 09:23:25

标签: apache-flink

问题:

flink作业管理器无法从检查点恢复。 原因:java.lang.IllegalStateException:该状态没有运算符

背景: 我正在k8s上运行flink 1.6.3。我在rocksdb上使用了增量检查点。

我试图传递参数--allowNonRestoredState来跳过无法恢复的保存点状态

从我的日志中:

  

2019-02-06 08:51:08.068 [main]信息   org.apache.flink.runtime.entrypoint.ClusterEntrypoint-
  --allowNonRestoredState

     

2019-02-06 08:51:22.827 [flink-akka.actor.default-dispatcher-14]信息   o.f.runtime.checkpoint.ZooKeeperCompletedCheckpointStore-   从ZooKeeper恢复检查点。 2019-02-06 08:51:22.883   [flink-akka.actor.default-dispatcher-14]信息   o.f.runtime.checkpoint.ZooKeeperCompletedCheckpointStore-找到1   ZooKeeper中的检查点。 2019-02-06 08:51:22.883   [flink-akka.actor.default-dispatcher-14]信息   o.f.runtime.checkpoint.ZooKeeperCompletedCheckpointStore-尝试   从存储中获取1个检查点。 2019-02-06 08:51:22.884   [flink-akka.actor.default-dispatcher-14]信息   o.f.runtime.checkpoint.ZooKeeperCompletedCheckpointStore-尝试   检索检查点1612.2019-02-06 08:51:22.977   [flink-akka.actor.default-dispatcher-14]信息   org.apache.flink.runtime.checkpoint.CheckpointCoordinator-恢复   最新有效检查点的工作00000000000000000000000000000000:   检查点1612 @ 1549376250641为00000000000000000000000000000000000000。   2019-02-06 08:51:22.982 [flink-akka.actor.default-dispatcher-14]错误   org.apache.flink.runtime.entrypoint.ClusterEntrypoint-致命错误   发生在集群入口点。 java.lang.RuntimeException:   org.apache.flink.runtime.client.JobExecutionException:无法设置   JobManager           在org.apache.flink.util.function.CheckedSupplier.lambda $ unchecked $ 0(CheckedSupplier.java:36)           在java.util.concurrent.CompletableFuture $ AsyncSupply.run(CompletableFuture.java:1590)           在akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)           在akka.dispatch.ForkJoinExecutorConfigurator $ AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)           在scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)           在scala.concurrent.forkjoin.ForkJoinPool $ WorkQueue.runTask(ForkJoinPool.java:1339)           在scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)           在scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)   引起原因:org.apache.flink.runtime.client.JobExecutionException:   无法设置JobManager           在org.apache.flink.runtime.jobmaster.JobManagerRunner。(JobManagerRunner.java:176)           在org.apache.flink.runtime.dispatcher.Dispatcher $ DefaultJobManagerRunnerFactory.createJobManagerRunner(Dispatcher.java:1058)           在org.apache.flink.runtime.dispatcher.Dispatcher.lambda $ createJobManagerRunner $ 5(Dispatcher.java:308)           在org.apache.flink.util.function.CheckedSupplier.lambda $ unchecked $ 0(CheckedSupplier.java:34)           ...省略了7个共同的框架原因:java.lang.IllegalStateException:该状态没有运算符   b22e6e8baea7d7e562d5a233f3301ce1           在org.apache.flink.runtime.checkpoint.StateAssignmentOperation.checkStateMappingCompleteness(StateAssignmentOperation.java:569)           在org.apache.flink.runtime.checkpoint.StateAssignmentOperation.assignStates(StateAssignmentOperation.java:77)           在org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreLatestCheckpointedState(CheckpointCoordinator.java:1049)           在org.apache.flink.runtime.jobmaster.JobMaster.createAndRestoreExecutionGraph(JobMaster.java:1138)           在org.apache.flink.runtime.jobmaster.JobMaster。(JobMaster.java:294)           在org.apache.flink.runtime.jobmaster.JobManagerRunner。(JobManagerRunner.java:157)           ...省略了10个常见框架2019-02-06 08:51:23.013 [TransientBlobCache关闭挂钩]信息   org.apache.flink.runtime.blob.TransientBlobCache-关闭BLOB   缓存2019-02-06 08:51:23.033 [BlobServer关闭挂钩]信息   org.apache.flink.runtime.blob.BlobServer-在以下位置停止了BLOB服务器   0.0.0.0:6124

预期结果:

作业将从最新的检查点开始运行,并将跳过无法恢复的状态

0 个答案:

没有答案