Flink保存点akka.pattern.AskTimeoutException:在[Actor [akka:// flink / user / jobmanager_1]]上询问超时

时间:2018-09-20 10:02:35

标签: apache-flink flink-streaming

当使用RocksDB作为后端存储并使用HDFS作为检查点的存储来运行Flink(1.6.0)数据流作业时,我在尝试通过cli进行保存点时遇到了akka.pattern.AskTimeoutException

org.apache.flink.util.FlinkException: Triggering a savepoint for the job e569cf53baecae9cb4fa794d590d670f failed.
    at org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend.java:714)
    at org.apache.flink.client.cli.CliFrontend.lambda$savepoint$9(CliFrontend.java:692)
    at org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:979)
    at org.apache.flink.client.cli.CliFrontend.savepoint(CliFrontend.java:689)
    at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1059)
    at org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1120)
    at org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
    at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1120)
Caused by: java.util.concurrent.CompletionException: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/jobmanager_15#-1160136947]] after [60000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
    at java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326)
    at java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338)
    at java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:911)
    at java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:899)
    at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
    at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
    at org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:770)
    at akka.dispatch.OnComplete.internal(Future.scala:258)
    at akka.dispatch.OnComplete.internal(Future.scala:256)
    at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
    at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
    at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
    at org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
    at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
    at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
    at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603)
    at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
    at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
    at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
    at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
    at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
    at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
    at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
    at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
    at java.lang.Thread.run(Thread.java:748)
Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/jobmanager_15#-1160136947]] after [60000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
    at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
    ... 9 more

我将askTimeout从10s增加到60s,但是我没有帮助。另外,在例外之后,我看到flink在HDFS上创建了2.4 G的保存点文件。

我想知道在这种情况下该怎么办?一旦状态开始进一步增长,将超时增加到高价值似乎不是可行的方法。

更新: 我感觉它实际上是CLI或JobManager的错误,正如我在日志中看到的那样,任务管理器能够在几秒钟内完成所有保存点。 截短的任务管理器日志: https://pastebin.com/fvu4uhAf

0 个答案:

没有答案