将String转换为类型Spark列

时间:2019-02-23 01:08:00

标签: scala apache-spark

代码:

import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.Column

def func(rawDF: DataFrame,primaryKey: Column, orderKey: Column): DataFrame = {

     //some process
    return newDf
} 

我正在尝试使用上述功能从现有的原始DF创建一个新的已处理DF。

代码:

var processedDF  = func(rawDF,"col1","col2")

错误:

<console>:73: error: type mismatch;
found   : String("col1")
required: org.apache.spark.sql.Column
   var processedDF  = func(rawDF,"col1","col2")
                                     ^

关于如何将函数参数的类型从String更改为org.apache.spark.sql.Column的任何建议

1 个答案:

答案 0 :(得分:1)

任何一个

import org.apache.spark.sql.functions.col

func(rawDF, col("col1"), col("col2"))

func(rawDF, rawDF("col1"), rawDF("col2"))

或直接通过Column(其中$spark对象)提供SparkSession

import spark.implicits.StringToColumn

func(rawDF, $"col1", $"col2")

Symbol

import spark.implicits.symbolToColumn

func(rawDF, 'col1, 'col2)