我有一个函数接受一个String参数并执行"匹配"在它上确定返回值,像这样 -
编辑(完成功能):
def getSubscriptionDaysFunc(account_status:Column, created_at: org.apache.spark.sql.Column, updated_at: org.apache.spark.sql.Column):org.apache.spark.sql.Column = {
account_status match {
case "expired" =>datediff(updated_at,created_at)
case "cancelled" =>datediff(updated_at,created_at)
case "active" =>datediff(updated_at, current_date())
case default => null
} }
以这种方式调用此函数 -
df.withColumn("subscription_days", getSubscriptionDaysFunc($"account_status",$"created_at",$"updated_at"))
这里$" account_status"返回"列"值。如何从"列"中获取字符串值?对象
编辑:我也尝试过以下方式编写UDF -
val getSubscriptionDaysFunc = udf((account_status:String, created_at: org.apache.spark.sql.Column, updated_at: org.apache.spark.sql.Column):Column => {
account_status match {
case "expired" =>datediff(updated_at,created_at)
case "cancelled" =>datediff(updated_at,created_at)
case "active" => datediff(updated_at, current_date())
case default => null
} })
这给出了错误 -
"错误:非法开始申报account_status匹配{"
答案 0 :(得分:1)
我认为你想要做的是实现一个UDF:
import org.apache.spark.sql.functions.udf
val getSubscriptionDaysFunc = udf((account_status:String) => {
account_status match {
case "expired" =>//some logic
case "cancelled" =>//some logic
case "active" =>//some logic
case default => null
}
})
df.withColumn("subscription_days", getSubscriptionDaysFunc($"account_status"))