我有以下函数将字符串映射序列展平为double。如何使类型字符串加倍泛型?
val flattenSeqOfMaps = udf { values: Seq[Map[String, Double]] => values.flatten.toMap }
flattenSeqOfMaps: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,MapType(StringType,DoubleType,false),Some(List(ArrayType(MapType(StringType,DoubleType,false),true))))
我需要类似的东西
val flattenSeqOfMaps[S,D] = udf { values: Seq[Map[S, D]] => values.flatten.toMap }
谢谢。
编辑1: 我正在使用spark 2.3。我知道Spark 2.4中的高阶函数
编辑2:我走近了一点。我需要什么来代替f _
中的val flattenSeqOfMaps = udf { f _}
。请在下面比较joinMap
类型签名和flattenSeqOfMaps
类型签名
scala> val joinMap = udf { values: Seq[Map[String, Double]] => values.flatten.toMap }
joinMap: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,MapType(StringType,DoubleType,false),Some(List(ArrayType(MapType(StringType,DoubleType,false),true))))
scala> def f[S,D](values: Seq[Map[S, D]]): Map[S,D] = { values.flatten.toMap}
f: [S, D](values: Seq[Map[S,D]])Map[S,D]
scala> val flattenSeqOfMaps = udf { f _}
flattenSeqOfMaps: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,MapType(NullType,NullType,true),Some(List(ArrayType(MapType(NullType,NullType,true),true))))
编辑3:以下代码对我有用。
scala> val flattenSeqOfMaps = udf { f[String,Double] _}
flattenSeqOfMaps: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,MapType(StringType,DoubleType,false),Some(List(ArrayType(MapType(StringType,DoubleType,false),true))))
答案 0 :(得分:1)
虽然您可以将函数定义为
import scala.reflect.runtime.universe.TypeTag
def flattenSeqOfMaps[S : TypeTag, D: TypeTag] = udf {
values: Seq[Map[S, D]] => values.flatten.toMap
}
然后使用特定实例:
val df = Seq(Seq(Map("a" -> 1), Map("b" -> 1))).toDF("val")
val flattenSeqOfMapsStringInt = flattenSeqOfMaps[String, Int]
df.select($"val", flattenSeqOfMapsStringInt($"val") as "val").show
+--------------------+----------------+
| val| val|
+--------------------+----------------+
|[[a -> 1], [b -> 1]]|[a -> 1, b -> 1]|
+--------------------+----------------|
还可以使用内置函数,而无需显式泛型:
import org.apache.spark.sql.functions.{expr, flatten, map_from_arrays}
def flattenSeqOfMaps_(col: String) = {
val keys = flatten(expr(s"transform(`$col`, x -> map_keys(x))"))
val values = flatten(expr(s"transform(`$col`, x -> map_values(x))"))
map_from_arrays(keys, values)
}
df.select($"val", flattenSeqOfMaps_("val") as "val").show
+--------------------+----------------+
| val| val|
+--------------------+----------------+
|[[a -> 1], [b -> 1]]|[a -> 1, b -> 1]|
+--------------------+----------------+
答案 1 :(得分:1)
以下代码对我有用。
scala> def f[S,D](values: Seq[Map[S, D]]): Map[S,D] = { values.flatten.toMap}
f: [S, D](values: Seq[Map[S,D]])Map[S,D]
scala> val flattenSeqOfMaps = udf { f[String,Double] _}
flattenSeqOfMaps: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,MapType(StringType,DoubleType,false),Some(List(ArrayType(MapType(StringType,DoubleType,false),true))))