Flink 作业失败以从检查点恢复

时间:2021-06-10 15:11:52

标签: apache-flink flink-streaming

我有一个 flink 作业,它使用来自 kafka 的数据,执行一些无状态平面图并将数据生成到 kafka,这是一个非常低容量的作业。

它通常在没有问题的情况下使用检查点,直到作业需要从检查点恢复,例如,它只是无法恢复具有以下堆栈跟踪的状态。

状态很小,我相信它只是 kafka 偏移量,它以 AT_LEAST_ONCE 语义运行。

所有操作符都设置了 .uid(),我完全没有想法。

这是尝试从检查点重新启动时的错误:

java.lang.Exception: Exception while creating StreamOperatorStateContext.
    at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:254) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:272) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.streaming.runtime.tasks.OperatorChain.initializeStateAndOpenOperators(OperatorChain.java:427) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$beforeInvoke$2(StreamTask.java:543) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:93) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:533) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:573) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    at java.lang.Thread.run(Thread.java:834) ~[?:?]
Caused by: org.apache.flink.util.FlinkException: Could not restore operator state backend for StreamSource_fb0ea8b0a502b80e8c29508f37436fa7_(1/1) from any of the 1 provided restore options.
    at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:160) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.operatorStateBackend(StreamTaskStateInitializerImpl.java:285) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:173) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    ... 9 more
Caused by: org.apache.flink.runtime.state.BackendBuildingException: Failed when trying to restore operator state backend
    at org.apache.flink.runtime.state.DefaultOperatorStateBackendBuilder.build(DefaultOperatorStateBackendBuilder.java:83) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.runtime.state.memory.MemoryStateBackend.createOperatorStateBackend(MemoryStateBackend.java:322) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$operatorStateBackend$0(StreamTaskStateInitializerImpl.java:276) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:168) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.operatorStateBackend(StreamTaskStateInitializerImpl.java:285) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:173) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    ... 9 more
Caused by: java.io.EOFException: No more bytes left.
    at org.apache.flink.api.java.typeutils.runtime.NoFetchingInput.require(NoFetchingInput.java:80) ~[dpa-runner-0.5.28-20210610.142951-35.jar:?]
    at com.esotericsoftware.kryo.io.Input.readUtf8Chars_slow(Input.java:835) ~[dpa-runner-0.5.28-20210610.142951-35.jar:?]
    at com.esotericsoftware.kryo.io.Input.readUtf8Chars(Input.java:828) ~[dpa-runner-0.5.28-20210610.142951-35.jar:?]
    at com.esotericsoftware.kryo.io.Input.readString(Input.java:785) ~[dpa-runner-0.5.28-20210610.142951-35.jar:?]
    at com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.read(DefaultSerializers.java:164) ~[dpa-runner-0.5.28-20210610.142951-35.jar:?]
    at com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.read(DefaultSerializers.java:154) ~[dpa-runner-0.5.28-20210610.142951-35.jar:?]
    at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:763) ~[dpa-runner-0.5.28-20210610.142951-35.jar:?]
    at com.esotericsoftware.kryo.serializers.ReflectField.read(ReflectField.java:120) ~[dpa-runner-0.5.28-20210610.142951-35.jar:?]
    at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:122) ~[dpa-runner-0.5.28-20210610.142951-35.jar:?]
    at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:793) ~[dpa-runner-0.5.28-20210610.142951-35.jar:?]
    at org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.deserialize(KryoSerializer.java:354) ~[dpa-runner-0.5.28-20210610.142951-35.jar:?]
    at org.apache.flink.api.java.typeutils.runtime.TupleSerializer.deserialize(TupleSerializer.java:151) ~[dpa-runner-0.5.28-20210610.142951-35.jar:?]
    at org.apache.flink.api.java.typeutils.runtime.TupleSerializer.deserialize(TupleSerializer.java:37) ~[dpa-runner-0.5.28-20210610.142951-35.jar:?]
    at org.apache.flink.runtime.state.OperatorStateRestoreOperation.deserializeOperatorStateValues(OperatorStateRestoreOperation.java:217) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.runtime.state.OperatorStateRestoreOperation.restore(OperatorStateRestoreOperation.java:188) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.runtime.state.DefaultOperatorStateBackendBuilder.build(DefaultOperatorStateBackendBuilder.java:80) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.runtime.state.memory.MemoryStateBackend.createOperatorStateBackend(MemoryStateBackend.java:322) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$operatorStateBackend$0(StreamTaskStateInitializerImpl.java:276) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:168) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.operatorStateBackend(StreamTaskStateInitializerImpl.java:285) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:173) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    ... 9 more

当在正常操作下进行检查点时,任务管理器会抛出此错误:

WARN  org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer [] - Falling back to default Kryo serializer because Chill serializer couldn't be found.
java.lang.reflect.InvocationTargetException: null
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?]
    at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
    at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
    at org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.getKryoInstance(KryoSerializer.java:444) ~[dpa-runner-0.5.28-20210610.142951-35.jar:?]
    at org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.checkKryoInitialized(KryoSerializer.java:467) ~[dpa-runner-0.5.28-20210610.142951-35.jar:?]
    at org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.copy(KryoSerializer.java:258) ~[dpa-runner-0.5.28-20210610.142951-35.jar:?]
    at org.apache.flink.api.java.typeutils.runtime.TupleSerializer.copy(TupleSerializer.java:115) ~[dpa-runner-0.5.28-20210610.142951-35.jar:?]
    at org.apache.flink.api.java.typeutils.runtime.TupleSerializer.copy(TupleSerializer.java:37) ~[dpa-runner-0.5.28-20210610.142951-35.jar:?]
    at org.apache.flink.runtime.state.ArrayListSerializer.copy(ArrayListSerializer.java:75) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.runtime.state.PartitionableListState.<init>(PartitionableListState.java:64) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.runtime.state.PartitionableListState.deepCopy(PartitionableListState.java:76) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.runtime.state.DefaultOperatorStateBackendSnapshotStrategy.snapshot(DefaultOperatorStateBackendSnapshotStrategy.java:89) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.runtime.state.DefaultOperatorStateBackend.snapshot(DefaultOperatorStateBackend.java:234) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:213) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:162) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:371) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.checkpointStreamOperator(SubtaskCheckpointCoordinatorImpl.java:686) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.buildOperatorSnapshotFutures(SubtaskCheckpointCoordinatorImpl.java:607) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.takeSnapshotSync(SubtaskCheckpointCoordinatorImpl.java:572) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.checkpointState(SubtaskCheckpointCoordinatorImpl.java:298) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$9(StreamTask.java:1004) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:93) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:988) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:912) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$triggerCheckpointAsync$8(StreamTask.java:885) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:93) [flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90) [flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:317) [flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:189) [flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:617) [flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:581) [flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755) [flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570) [flink-dist_2.12-1.12.2.jar:1.12.2]
    at java.lang.Thread.run(Thread.java:834) [?:?]
Caused by: com.esotericsoftware.kryo.KryoException: Unable to resolve type variable: A
    at com.esotericsoftware.kryo.util.GenericsUtil.resolveTypeVariable(GenericsUtil.java:114) ~[dpa-runner-0.5.28-20210610.142951-35.jar:?]
    at com.esotericsoftware.kryo.util.GenericsUtil.resolveTypeVariable(GenericsUtil.java:86) ~[dpa-runner-0.5.28-20210610.142951-35.jar:?]
    at com.esotericsoftware.kryo.util.GenericsUtil.resolveType(GenericsUtil.java:41) ~[dpa-runner-0.5.28-20210610.142951-35.jar:?]
    at com.esotericsoftware.kryo.util.Generics$GenericType.initialize(Generics.java:263) ~[dpa-runner-0.5.28-20210610.142951-35.jar:?]
    at com.esotericsoftware.kryo.util.Generics$GenericType.<init>(Generics.java:228) ~[dpa-runner-0.5.28-20210610.142951-35.jar:?]
    at com.esotericsoftware.kryo.util.Generics$GenericType.initialize(Generics.java:242) ~[dpa-runner-0.5.28-20210610.142951-35.jar:?]
    at com.esotericsoftware.kryo.util.Generics$GenericType.<init>(Generics.java:228) ~[dpa-runner-0.5.28-20210610.142951-35.jar:?]
    at com.esotericsoftware.kryo.serializers.CachedFields.addField(CachedFields.java:139) ~[dpa-runner-0.5.28-20210610.142951-35.jar:?]
    at com.esotericsoftware.kryo.serializers.CachedFields.rebuild(CachedFields.java:99) ~[dpa-runner-0.5.28-20210610.142951-35.jar:?]
    at com.esotericsoftware.kryo.serializers.FieldSerializer.<init>(FieldSerializer.java:82) ~[dpa-runner-0.5.28-20210610.142951-35.jar:?]
    at com.esotericsoftware.kryo.serializers.FieldSerializer.<init>(FieldSerializer.java:68) ~[dpa-runner-0.5.28-20210610.142951-35.jar:?]
    at org.apache.flink.runtime.types.ScalaCollectionsRegistrar.useField$1(FlinkScalaKryoInstantiator.scala:93) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.runtime.types.ScalaCollectionsRegistrar.apply(FlinkScalaKryoInstantiator.scala:98) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.runtime.types.AllScalaRegistrar.apply(FlinkScalaKryoInstantiator.scala:172) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    at org.apache.flink.runtime.types.FlinkScalaKryoInstantiator.newKryo(FlinkScalaKryoInstantiator.scala:84) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
    ... 35 more

1 个答案:

答案 0 :(得分:0)

IDE 一定添加了这个 ****tty 依赖项:

<dependency>
    <groupId>de.javakaffee</groupId>
    <artifactId>kryo-serializers</artifactId>
    <version>0.45</version>
</dependency>

删除它,现在一切都按预期进行