否则数据帧火花的情况

时间:2019-04-03 11:47:41

标签: scala apache-spark

我写了这个:

val result = df.withColumn("Ind", when($"color" === "Green", 1).otherwise(0))

我想将条件$"color" === "Green"扩展到$"color" in ["GREEN", "RED", "YELLOW"]

任何想法如何做到这一点?

2 个答案:

答案 0 :(得分:3)

您可以使用

$"color".isin("GREEN","RED","YELLOW")

代码示例:

val df2 = df.withColumn("Ind", 
when($"color".isin("GREEN","RED","YELLOW"), 1).otherwise(0))
df2.show(false)

输出:

+------+---+
| color|Ind|
+------+---+
|   RED|  1|
| GREEN|  1|
|YELLOW|  1|
|  PINK|  0|
+------+---+

快速搜索发现了一个类似的问题,已经在堆栈溢出中得到了解答:Spark SQL - IN clause

答案 1 :(得分:0)

您应该可以使用以下方法对列表中的列进行检查:

val result = df.withColumn("Ind",
  when($"color".in("GREEN", "RED", "YELLOW"), 1).otherwise(0))