flink在k8s上缺少状态值-Jobmanager / Taskmanager崩溃时恢复作业

时间:2020-06-09 08:40:41

标签: apache-flink flink-streaming flink-cep

在kubernetes上运行flink作业集群(部署/吊舱)时,我们删除了jobmanager和taskmanager(kubectl删除吊舱XXX)。我们发现在Pod运行正常后,以前的Pod丢失了rockDB和checkpoint文件路径的状态。 在Pod运行后是否有任何恢复状态的建议?我仔细检查了代码。我发现未启用检查点。是工作无法恢复的根本原因吗?

环境设置低于

RocksDBStateBackend backend = new RocksDBStateBackend(checkPointDataUri + "/checkpoint",true);
        backend.setDbStoragePath(checkPointDataUri + "/RocksDB");
        backend.setNumberOfTransferingThreads(1);

        // add state backend
        env.setStateBackend((StateBackend)backend);

我们可以启用以下检查点吗?

    env.enableCheckpointing(1000);
        env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);
        env.getCheckpointConfig().setMinPauseBetweenCheckpoints(500);
        env.getCheckpointConfig().setCheckpointTimeout(60000);
        env.getCheckpointConfig().setMaxConcurrentCheckpoints(1);
        env.getCheckpointConfig().enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);

下面是重新启动日志。

2020-06-09 06:48:11,921 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Configuring application-defined state backend with job/cluster config
2020-06-09 06:48:11,921 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Using application-defined state backend: RocksDBStateBackend{checkpointStreamBackend=File State Backend (checkpoints: 'file:/opt/flink/data/ss-kpi-ewfj8/checkpoint', savepoints: 'null', asynchronous: UNDEFINED, fileStateThreshold: -1), localRocksDbDirectories=[/opt/flink/data/ss-kpi-ewfj8/RocksDB], enableIncrementalCheckpointing=TRUE, numberOfTransferingThreads=1}
2020-06-09 06:48:11,962 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Configuring application-defined state backend with job/cluster config
2020-06-09 06:48:11,941 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Using application-defined state backend: RocksDBStateBackend{checkpointStreamBackend=File State Backend (checkpoints: 'file:/opt/flink/data/ss-kpi-ewfj8/checkpoint', savepoints: 'null', asynchronous: UNDEFINED, fileStateThreshold: -1), localRocksDbDirectories=[/opt/flink/data/ss-kpi-ewfj8/RocksDB], enableIncrementalCheckpointing=TRUE, numberOfTransferingThreads=1}
2020-06-09 06:48:11,962 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Configuring application-defined state backend with job/cluster config
2020-06-09 06:48:11,941 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Using application-defined state backend: RocksDBStateBackend{checkpointStreamBackend=File State Backend (checkpoints: 'file:/opt/flink/data/ss-kpi-ewfj8/checkpoint', savepoints: 'null', asynchronous: UNDEFINED, fileStateThreshold: -1), localRocksDbDirectories=[/opt/flink/data/ss-kpi-ewfj8/RocksDB], enableIncrementalCheckpointing=TRUE, numberOfTransferingThreads=1}
2020-06-09 06:48:11,963 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Configuring application-defined state backend with job/cluster config
2020-06-09 06:48:11,921 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Using application-defined state backend: RocksDBStateBackend{checkpointStreamBackend=File State Backend (checkpoints: 'file:/opt/flink/data/ss-kpi-ewfj8/checkpoint', savepoints: 'null', asynchronous: UNDEFINED, fileStateThreshold: -1), localRocksDbDirectories=[/opt/flink/data/ss-kpi-ewfj8/RocksDB], enableIncrementalCheckpointing=TRUE, numberOfTransferingThreads=1}
2020-06-09 06:48:11,963 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Configuring application-defined state backend with job/cluster config
2020-06-09 06:48:11,963 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Using application-defined state backend: RocksDBStateBackend{checkpointStreamBackend=File State Backend (checkpoints: 'file:/opt/flink/data/ss-kpi-ewfj8/checkpoint', savepoints: 'null', asynchronous: UNDEFINED, fileStateThreshold: -1), localRocksDbDirectories=[/opt/flink/data/ss-kpi-ewfj8/RocksDB], enableIncrementalCheckpointing=TRUE, numberOfTransferingThreads=1}
2020-06-09 06:48:11,963 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Configuring application-defined state backend with job/cluster config
2020-06-09 06:48:11,942 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Using application-defined state backend: RocksDBStateBackend{checkpointStreamBackend=File State Backend (checkpoints: 'file:/opt/flink/data/ss-kpi-ewfj8/checkpoint', savepoints: 'null', asynchronous: UNDEFINED, fileStateThreshold: -1), localRocksDbDirectories=[/opt/flink/data/ss-kpi-ewfj8/RocksDB], enableIncrementalCheckpointing=TRUE, numberOfTransferingThreads=1}
2020-06-09 06:48:11,965 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Configuring application-defined state backend with job/cluster config
2020-06-09 06:48:11,961 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Using application-defined state backend: RocksDBStateBackend{checkpointStreamBackend=File State Backend (checkpoints: 'file:/opt/flink/data/ss-kpi-ewfj8/checkpoint', savepoints: 'null', asynchronous: UNDEFINED, fileStateThreshold: -1), localRocksDbDirectories=[/opt/flink/data/ss-kpi-ewfj8/RocksDB], enableIncrementalCheckpointing=TRUE, numberOfTransferingThreads=1}
2020-06-09 06:48:11,965 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Configuring application-defined state backend with job/cluster config
2020-06-09 06:48:11,942 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Using application-defined state backend: RocksDBStateBackend{checkpointStreamBackend=File State Backend (checkpoints: 'file:/opt/flink/data/ss-kpi-ewfj8/checkpoint', savepoints: 'null', asynchronous: UNDEFINED, fileStateThreshold: -1), localRocksDbDirectories=[/opt/flink/data/ss-kpi-ewfj8/RocksDB], enableIncrementalCheckpointing=TRUE, numberOfTransferingThreads=1}
2020-06-09 06:48:11,981 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Configuring application-defined state backend with job/cluster config
2020-06-09 06:48:11,944 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Using application-defined state backend: RocksDBStateBackend{checkpointStreamBackend=File State Backend (checkpoints: 'file:/opt/flink/data/ss-kpi-ewfj8/checkpoint', savepoints: 'null', asynchronous: UNDEFINED, fileStateThreshold: -1), localRocksDbDirectories=[/opt/flink/data/ss-kpi-ewfj8/RocksDB], enableIncrementalCheckpointing=TRUE, numberOfTransferingThreads=1}

1 个答案:

答案 0 :(得分:0)

将RocksDB和检查点存储在同一文件系统中没有意义。 RocksDB应该使用最快的可用本地文件系统-kubernetes临时存储很好。并且检查点必须存储在某种分布式文件系统中的持久性存储中。