Question

这是一个非常有效的尝试，使用带有scala匿名函数的Flink fold：

val myFoldFunction = (x: Double, t:(Double,String,String)) => x + t._1
env.readFileStream(...).
...
.groupBy(1)
.fold(0.0, myFoldFunction : Function2[Double, (Double,String,String), Double])

它汇编得很好，但在执行时，我得到了一个＆＃34;类型的擦除问题＆＃34; （见下文）。在Java中这样做很好，但当然更冗长。我喜欢简洁明了的lambda。我怎么能在scala中做到这一点？

Caused by: org.apache.flink.api.common.functions.InvalidTypesException:
Type of TypeVariable 'R' in 'public org.apache.flink.streaming.api.scala.DataStream org.apache.flink.streaming.api.scala.DataStream.fold(java.lang.Object,scala.Function2,org.apache.flink.api.common.typeinfo.TypeInformation,scala.reflect.ClassTag)' could not be determined. 
This is most likely a type erasure problem. 
The type extraction currently supports types with generic variables only in cases where all variables in the return type can be deduced from the input type(s).

Answer 1

您遇到的问题是Flink [1]中的错误。问题源于Flink的TypeExtractor以及Scala DataStream API在Java实现之上的实现方式。 TypeExtractor无法为Scala类型生成TypeInformation，因此会返回MissingTypeInformation。创建StreamFold运算符后，手动设置此缺少的类型信息。但是，StreamFold运算符的实现方式是它不接受MissingTypeInformation，因此在设置正确的类型信息之前会失败。

我已经打开拉取请求[2]来解决此问题。它应该在接下来的两天内合并。通过使用最新的0.10快照版本，您的问题应该得到解决。

如何在scala中使用flink fold函数

1 个答案: