使用Guava缓存进行Flink-ProcessFunction的实现不可序列化

时间:2019-04-02 06:02:40

标签: scala serialization guava apache-flink flink-streaming

我实现了一个ProcessFunction,该函数使用Guava缓存来过滤传入事件流。代码如下:

object myJob {
 private def updateCache(cacheObject, someValue) = {}
 private def getCacheValue(cacheObject, someKey) = {}

 override def run(params, executionEnv) = {
  val inputStream = executionEnv.stream

  val c = CacheBuilder.newBuilder()

  val outStream = inputStream.process(new ProcessFunction() { 
    updateCache()
    getCacheValue} 
    )
 }
}

将作业提交给Flink时,出现以下错误:

Caused by: org.apache.flink.api.common.InvalidProgramException: The implementation of the ProcessFunction is not serializable. The object probably contains or references non serializable fields.
at org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:99)
at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.clean(StreamExecutionEnvironment.java:1560)
at org.apache.flink.streaming.api.datastream.DataStream.clean(DataStream.java:185)
at org.apache.flink.streaming.api.datastream.DataStream.process(DataStream.java:666)
at org.apache.flink.streaming.api.scala.DataStream.process(DataStream.scala:686)

任何关于我做错事情的想法吗?如何解决此序列化错误?

1 个答案:

答案 0 :(得分:0)

该错误基本上表明您所依赖的对象不可针对Flink进行序列化。如果您已显示,则使用加载程序将字段标记为惰性可以解决该问题:

   lazy val c = CacheBuilder.newBuilder()

通常,在这种情况下,您应该参考Flink的documentation,它说明了问题