Scala条件和几个列的逻辑

时间:2017-06-09 17:26:38

标签: scala apache-spark

我正在尝试使用基于其他几列的值创建一个列:

val zzz = sc.parallelize(Seq(("2016-06-23", "VFF", "NO"), ("2016-06-23", 
null, "NO"), ("2016-01-23", "VFF", "NO"), ("2016-01-23", null, "NO")))
.toDF("last_ts", "fa_disposition", "vfir_scrap")

val newCol = when(to_date(col("last_ts")) >= "2016-06-01" && 
 col("fa_disposition").isNull(), 1)
.when(col("fa_disposition")=="VFF" && col("vfir_scrap")=="NO", -1)
.otherwise(0);    

val hdd3=zzz.withColumn("failure", newCol)

然而,我收到错误:

> error: type mismatch;
  found   : Boolean
 required: org.apache.spark.sql.Column
           .when(col("fa_disposition")=="VFF" && col("vfir_scrap")=="NO", -1)

我尝试搜索,查看专栏的文档,等等,我不明白这一点。 请帮忙!

3 个答案:

答案 0 :(得分:1)

你需要使用.when(col("fa_disposition")==="VFF" && col("vfir_scrap")==="NO", -1) 的{​​{3}},而不是Scala的等号:

PostsController

答案 1 :(得分:1)

您必须将==替换为===(列相等),将isNull()替换为isNull

val zzz = sc.parallelize(Seq(("2016-06-23", "VFF", "NO"), ("2016-06-23", 
null, "NO"), ("2016-01-23", "VFF", "NO"), ("2016-01-23", null, "NO")))
.toDF("last_ts", "fa_disposition", "vfir_scrap")

val newCol = when(to_date(col("last_ts")) >= lit("2016-06-01") && 
 col("fa_disposition").isNull, 1)
.when(col("fa_disposition")==="VFF" && col("vfir_scrap")==="NO", -1)
.otherwise(0);    

val hdd3=zzz.withColumn("failure", newCol)

答案 2 :(得分:0)

以下是使用udf函数的解决方案。 Udf函数需要数据序列化和反序列化,并且{strong>不推荐在SQL functions足以满足解决方案时使用。所以@Raphael Roth的回答是这个案例的理想选择。

此解决方案仅适用于知识库,以上解决方案也可以使用udf函数完成

import sqlContext.implicits._
import org.apache.spark.sql.functions._

val zzz = sc.parallelize(Seq(("2016-06-23", "VFF", "NO"), ("2016-06-23",
  null, "NO"), ("2016-01-23", "VFF", "NO"), ("2016-01-23", null, "NO")))
  .toDF("last_ts", "fa_disposition", "vfir_scrap")

def failure = udf((last_ts: String, fa_disposition: String, vfir_scrap: String) => {
  if((last_ts > "2016-06-01") && fa_disposition == null) 1
  else if((fa_disposition == "VFF") && vfir_scrap == "NO") -1
  else 0
})

val hdd3 = zzz.withColumn("failure", failure($"last_ts", $"fa_disposition", $"vfir_scrap"))