代码:
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.Column
def func(rawDF: DataFrame,primaryKey: Column, orderKey: Column): DataFrame = {
//some process
return newDf
}
我正在尝试使用上述功能从现有的原始DF创建一个新的已处理DF。
代码:
var processedDF = func(rawDF,"col1","col2")
错误:
<console>:73: error: type mismatch;
found : String("col1")
required: org.apache.spark.sql.Column
var processedDF = func(rawDF,"col1","col2")
^
关于如何将函数参数的类型从String更改为org.apache.spark.sql.Column的任何建议
答案 0 :(得分:1)
任何一个
import org.apache.spark.sql.functions.col
func(rawDF, col("col1"), col("col2"))
或
func(rawDF, rawDF("col1"), rawDF("col2"))
或直接通过Column
(其中$
是spark
对象)提供SparkSession
import spark.implicits.StringToColumn
func(rawDF, $"col1", $"col2")
或Symbol
import spark.implicits.symbolToColumn
func(rawDF, 'col1, 'col2)