我正在尝试将if-else与Spark中的窗口函数结合使用
输入DF:
col1 . col2 . TimeStamp1 TimeStamp 2
10 . 1 . 10:00 . 11:00
20 . 1 . 2:00 . 3:00
20 . 2 . 4:00 . 5:00
20 . 3 . 6:00 . 7:00
窗口:
time_window = Window.partitionBy($"col1").orderBy($"col2")
用例是
的组合(col1, max(col2) === 1): then new_col = (unix_timestamp (TimeStamp1) - unix_timestamp (TimeStamp2)).over(time_window)
其他:
//I only need TimeStamp1 to create the lag
new_col = (unix_timestamp($"TimeStamp1") - unix_timestamp(lag($"TimeStamp1", 1))).over(time_window)
代码:
df.withColumn("new_col", when(max($"col1").over(time_window) === "1", (unix_timestamp($"TimeStamp1") -unix_timestamp($"TimeStamp2")).otherwise((unix_timestamp($"TimeStamp1") - unix_timestamp(lag($"TimeStamp1", 1).over(time_window)))/3600.0)))
错误:
java.lang.IllegalArgumentException: otherwise() can only be applied on a Column previously generated by when()
关于我要去哪里的任何建议,或者有其他方法可以实现此建议。谢谢。