如何从Spark Scala中的Column数据类型中提取String?

时间:2017-07-13 11:47:23

标签: scala apache-spark apache-spark-sql spark-dataframe

我有一个函数接受一个String参数并执行"匹配"在它上确定返回值,像这样 -

编辑(完成功能):

 def getSubscriptionDaysFunc(account_status:Column, created_at: org.apache.spark.sql.Column, updated_at: org.apache.spark.sql.Column):org.apache.spark.sql.Column = {
account_status match {
    case "expired" =>datediff(updated_at,created_at)
    case "cancelled" =>datediff(updated_at,created_at)
    case "active" =>datediff(updated_at, current_date())
    case default => null 
} }  

以这种方式调用此函数 -

df.withColumn("subscription_days", getSubscriptionDaysFunc($"account_status",$"created_at",$"updated_at"))

这里$" account_status"返回"列"值。如何从"列"中获取字符串值?对象

编辑:我也尝试过以下方式编写UDF -

val getSubscriptionDaysFunc = udf((account_status:String, created_at: org.apache.spark.sql.Column, updated_at: org.apache.spark.sql.Column):Column =>  {
account_status match {
case "expired" =>datediff(updated_at,created_at)
case "cancelled" =>datediff(updated_at,created_at)
case "active" => datediff(updated_at, current_date())
case default => null
} })

这给出了错误 -

  

"错误:非法开始申报account_status匹配{"

1 个答案:

答案 0 :(得分:1)

我认为你想要做的是实现一个UDF:

import org.apache.spark.sql.functions.udf

val getSubscriptionDaysFunc = udf((account_status:String) =>  {
  account_status match {
    case "expired" =>//some logic
    case "cancelled" =>//some logic
    case "active" =>//some logic
    case default => null
  } 
})

df.withColumn("subscription_days", getSubscriptionDaysFunc($"account_status"))