Question

我正在尝试创建一个函数列表，以便将其映射到DataFrame，但即使在搜索之后，我也无法弄清楚如何将完全限定的函数名称传递到列表中。即使我已经编译了，我也很确定math.min和math.max不是我想要的，因为我实际执行的函数来自org.apache.spark.sql.functions._进口。

如何从特定导入创建函数列表？

import org.apache.spark.sql.functions._

// This works - map each function over the DF columns
df.select(df.columns.map(mean): _*).show
df.select(df.columns.map(max): _*).show
df.select(df.columns.map(min): _*).show  

val functions = Array(math.min _, math.max _) // this isn't throwing errors  
/*****************************************************************************/  
// These attempts to create function lists don't work
val functions = Array(org.apache.spark.sql.functions.mean _, math.min _, math.max _) // won't compile  
val functions = Array(_ => org.apache.spark.sql.functions.mean(_), math.min _, math.max _) // doesn't work

// apply each function to the columns and then combine into one dataframe
functions.map(f => df.select(numeric_df.columns.map(f): _*)).reduce(_ union _).show

Answer 1

如果要创建包含常量a，b，...，z的列表，那么

确保常量在范围内（例如通过导入它们）
将它们列入清单

这样的事情：

import org.apache.spark.sql.functions.{mean, min, max}
val functions: Array[String => Column] = 
  Array(mean(_: String), min(_: String), max(_: String))

eta扩展中的显式类型注释是必要的，因为方法mean，min，max被重载（同时有mean(colName: String)和mean(c: Column)）

这些函数当然与math.max等无关，这些是可以应用于列的spark-sql函数。

Scala：从导入创建函数列表

1 个答案: