由于ConcurrentModificationException而导致“无法完成快照”导致Flink管道失败

时间:2020-03-10 07:40:25

标签: apache-flink

为Flink管道启用检查点之后,我们会定期在下面获取异常,这将导致管道失败。 管道从Kafka读取数据,进行一些无状态的转换(映射),然后通过StreamingFileSink写入HDFS。

org.apache.flink.runtime.checkpoint.CheckpointException: Could not complete snapshot 1080 for operator foo -> bar -> Sink: Hadoop (1/2). Failure reason: Checkpoint was declined.
        at org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:431)
        at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.checkpointStreamOperator(StreamTask.java:1282)
        at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1216)
        at org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:872)
        at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:777)
        at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:708)
        at org.apache.flink.streaming.runtime.io.CheckpointBarrierHandler.notifyCheckpoint(CheckpointBarrierHandler.java:88)
        at org.apache.flink.streaming.runtime.io.CheckpointBarrierAligner.processBarrier(CheckpointBarrierAligner.java:113)
        at org.apache.flink.streaming.runtime.io.CheckpointedInputGate.pollNext(CheckpointedInputGate.java:155)
        at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.pollNextNullable(StreamTaskNetworkInput.java:102)
        at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.pollNextNullable(StreamTaskNetworkInput.java:47)
        at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:135)
        at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:279)
        at org.apache.flink.streaming.runtime.tasks.StreamTask.run(StreamTask.java:301)
        at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:406)
        at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
        at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.ConcurrentModificationException
        at java.util.HashMap$HashIterator.nextNode(HashMap.java:1445)
        at java.util.HashMap$EntryIterator.next(HashMap.java:1479)
        at java.util.HashMap$EntryIterator.next(HashMap.java:1477)
        at org.apache.flink.api.common.typeutils.base.MapSerializer.copy(MapSerializer.java:105)
        at org.apache.flink.api.common.typeutils.base.MapSerializer.copy(MapSerializer.java:43)
        at org.apache.flink.api.java.typeutils.runtime.PojoSerializer.copy(PojoSerializer.java:239)
        at org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.copy(StreamElementSerializer.java:105)
        at org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.copy(StreamElementSerializer.java:46)
        at org.apache.flink.runtime.state.ArrayListSerializer.copy(ArrayListSerializer.java:73)
        at org.apache.flink.runtime.state.PartitionableListState.<init>(PartitionableListState.java:68)
        at org.apache.flink.runtime.state.PartitionableListState.deepCopy(PartitionableListState.java:80)
        at org.apache.flink.runtime.state.DefaultOperatorStateBackendSnapshotStrategy.snapshot(DefaultOperatorStateBackendSnapshotStrategy.java:88)
        at org.apache.flink.runtime.state.DefaultOperatorStateBackend.snapshot(DefaultOperatorStateBackend.java:261)
        at org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:406)
        ... 17 more

当前,只有一个节点,并且将检查点配置为使用本地文件系统:

state.backend: filesystem
state.checkpoints.dir: file://opt/flink/checkpoints

我完全不确定如何处理此错误。

这是Flink 1.9.1。

0 个答案:

没有答案