无法将保存点从1.2.1还原到1.4

时间:2018-01-04 15:44:19

标签: apache-flink flink-streaming

我们已经部署了一个新版本的Flink 1.4版本。 在尝试从旧的1.2.1部署中恢复保存点时,我们尝试还原的所有作业都会出现相同的错误:

org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable failure. This suppresses job restarts. Please check the stack trace for the root cause.
    at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1360)
    at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1336)
    at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1336)
    at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
    at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
    at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.IllegalStateException: Legacy state (from Flink <= 1.1, created through the 'Checkpointed' interface) is no longer supported starting from Flink 1.4. Please rewrite your job to use 'CheckpointedFunction' instead!
    at org.apache.flink.util.Preconditions.checkState(Preconditions.java:195)
    at org.apache.flink.runtime.checkpoint.savepoint.SavepointV1Serializer.deserializeSubtaskState(SavepointV1Serializer.java:171)
    at org.apache.flink.runtime.checkpoint.savepoint.SavepointV1Serializer.deserialize(SavepointV1Serializer.java:96)
    at org.apache.flink.runtime.checkpoint.savepoint.SavepointV1Serializer.deserialize(SavepointV1Serializer.java:54)
    at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepointWithHandle(SavepointStore.java:278)
    at org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:70)
    at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreSavepoint(CheckpointCoordinator.java:1141)
    at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1350)
    ... 10 more

错误消息:

从Flink 1.4开始,不再支持旧版状态(来自Flink&lt; = 1.1,通过&#39; Checkpointed&#39;接口创建)。请重写您的工作以使用&#39; CheckpointedFunction&#39;代替!

然而,似乎是错误的,因为我们的其他部署正在运行1.2.1。

对于1.4:https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/upgrading.html,文档页面仍然没有更新,但似乎并行性在过去一直是一个问题。我尝试过使用与保存点即将到来的工作相同的问题,但仍然存在同样的问题。

关于可能导致此问题以及如何解决问题的任何提示?

谢谢!

2 个答案:

答案 0 :(得分:2)

对于1.4.0版本,Flink不再支持从使用Checkpointed接口的状态恢复。要进行有状态升级,您必须执行以下操作:

  1. 在Flink 1.2.1上运行您的工作保存点
  2. 在所有有状态函数中将Checkpointed替换为CheckpointedFunction
  3. 实施CheckpointedRestoring接口以从Checkpointed保存点
  4. 恢复
  5. 在Flink 1.2.1上执行修改后的作业并采取第二个保存点
  6. 从所有有状态函数中删除CheckpointedRestoring接口
  7. 使用Flink 1.4.0上的第二个保存点运行已修改的作业
  8. 让我知道在迁移工作时是否还有其他问题。

答案 1 :(得分:0)

所以,最后想出了这个问题。

我们开始在Flink 1.1中运行我们的任务,然后将其保存点迁移到1.2.1。

似乎Flink 1.2.1没有对保存点进行任何升级,因此它们仍然具有旧格式,即Flink 1.4不支持的格式。

解决方案是在Flink 1.3中运行我们的任务+保存点,并在那里创建一个新的保存点,它将以新格式保存。这个最终与Flink 1.4兼容:)