更新keyBy()中指定的密钥

时间:2019-08-08 13:03:24

标签: apache-flink flink-streaming

我有一个在生产环境中运行的Flink流作业,我需要更改主要的转换代码。

生产中的代码实际上看起来像这样:

stream
   .filter(inboundData -> inboundData.hasToBeFiltered())
   .uid("filtered-data")
   .keyBy(data -> data.getMyStringKey())
   .process(doSomething())
   .uid("processed-inbound-data-id");

我需要使用inboundData POJO的不同属性来更改keyBy()运算符对数据进行分区的方式。当前使用的属性是字符串,而新属性是Long。

因此,新代码如下所示:

stream
   .filter(inboundData -> inboundData.hasToBeFiltered())
   .uid("filtered-data")
   .keyBy(data -> data.getMyLongKey())
   .process(doSomething())
   .uid("processed-inbound-data-id");

我执行了以上更改,并尝试将作业的更新版本提交给我的Flink集群,从取消旧作业之前获取的保存点恢复了操作员的状态,但出现以下错误:

java.lang.Exception: Exception while creating StreamOperatorStateContext.
    at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:195)
    at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:250)
    at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:738)
    at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:289)
    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
    at java.lang.Thread.run(Thread.java:748)

    Caused by: org.apache.flink.util.FlinkException: Could not restore keyed state backend for LegacyKeyedProcessOperator_632e4c67d1f4899514828b9c5059a9bb_(1/1) from any of the 1 provided restore options.
    at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
    at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:307)
    at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135)
    ... 5 more

    Caused by: org.apache.flink.runtime.state.BackendBuildingException: Caught unexpected exception.
    at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:324)
    at org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:520)
    at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:291)
    at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142)
    at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121)
    ... 7 more

    Caused by: org.apache.flink.util.StateMigrationException: The new key serializer must be compatible.
    at org.apache.flink.contrib.streaming.state.restore.AbstractRocksDBRestoreOperation.readMetaData(AbstractRocksDBRestoreOperation.java:194)
    at org.apache.flink.contrib.streaming.state.restore.RocksDBFullRestoreOperation.restoreKVStateMetaData(RocksDBFullRestoreOperation.java:170)
    at org.apache.flink.contrib.streaming.state.restore.RocksDBFullRestoreOperation.restoreKeyGroupsInStateHandle(RocksDBFullRestoreOperation.java:157)
    at org.apache.flink.contrib.streaming.state.restore.RocksDBFullRestoreOperation.restore(RocksDBFullRestoreOperation.java:141)
    at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:268)
    ... 11 more

从堆栈跟踪中,我可以推断出该错误是由于我正在更改keyBy()运算符中使用的键的类型而导致的。

我尝试摆弄代码,搜索有关该主题的问题,但是我找不到任何有意义的信息来提示我如何执行所需的更改。

所以我的问题是:

  • 我尝试执行的更改是否可以实现而不丢失已保存的状态?
  • 如果是这样,谁能给我一个有关如何进行此类更改的线索?

非常感谢。

1 个答案:

答案 0 :(得分:0)

我认为您应该能够使用State Processor API(即将作为Flink 1.9的一部分发布)编写一个DataSet程序,该程序读取使用旧版本获取的保存点并编写一个新的保存点与新版本兼容。