我有一个数据框
+-----+----+----------+------------+----------+------------+
|empId| lId| date1| time1 | date2 | time2 |
+-----+----+----------+------------+----------+------------+
| 1234|1212|2018-04-20|21:40:29.077|2018-04-20|22:40:29.077|
| 1235|1212|2018-04-20|22:40:29.077|2018-04-21|00:40:29.077|
+-----+----+----------+------------+----------+------------+
需要找到每个empId的2个日期时间(以分钟为单位)之间的时差,并另存为新列。 要求的输出:
+-----+----+----------+------------+----------+------------+---------+
|empId| lId| date1| time1 | date2 | time2 |TimeDiff |
+-----+----+----------+------------+----------+------------+---------+
| 1234|1212|2018-04-20|21:40:29.077|2018-04-20|22:40:29.077|60 |
| 1235|1212|2018-04-20|22:40:29.077|2018-04-21|00:40:29.077|120 |
+-----+----+----------+------------+----------+------------+---------+
答案 0 :(得分:3)
您可以concat
date
和time
将其转换为timestamp
并在几分钟内找到difference
,如下所示
import org.apache.spark.sql.functions._
val format = "yyyy-MM-dd HH:mm:ss.SSS" //datetime format after concat
val newDF = df1.withColumn("TimeDiffInMinute",
abs(unix_timestamp(concat_ws(" ", $"date1", $"time1"), format).cast("long")
- (unix_timestamp(concat_ws(" ", $"date2", $"time2"), format)).cast("long") / 60D
)
unix_timestamp
要将datetime
转换为timestamp
,减去timestamp
会产生seconds
并除以60会产生minutes
。< / p>
输出:
+-----+----+----------+------------+----------+------------+---------+
|empId| lId| date1| time1| date2| time2|dateTime1|
+-----+----+----------+------------+----------+------------+---------+
| 1234|1212|2018-04-20|21:40:29.077|2018-04-20|22:40:29.077| 60.0|
| 1235|1212|2018-04-20|22:40:29.077|2018-04-21|00:40:29.077| 120.0|
+-----+----+----------+------------+----------+------------+---------+
希望这有帮助!