如何在Scala中查找2个日期时间之间的时差?

时间:2018-05-15 12:36:00

标签: scala apache-spark apache-spark-sql

我有一个数据框

+-----+----+----------+------------+----------+------------+
|empId| lId|     date1|      time1 |  date2   |    time2   |
+-----+----+----------+------------+----------+------------+
| 1234|1212|2018-04-20|21:40:29.077|2018-04-20|22:40:29.077|
| 1235|1212|2018-04-20|22:40:29.077|2018-04-21|00:40:29.077|
+-----+----+----------+------------+----------+------------+

需要找到每个empId的2个日期时间(以分钟为单位)之间的时差,并另存为新列。 要求的输出:

    +-----+----+----------+------------+----------+------------+---------+
    |empId| lId|     date1|      time1 |  date2   |    time2   |TimeDiff |
    +-----+----+----------+------------+----------+------------+---------+
    | 1234|1212|2018-04-20|21:40:29.077|2018-04-20|22:40:29.077|60       |
    | 1235|1212|2018-04-20|22:40:29.077|2018-04-21|00:40:29.077|120      |
    +-----+----+----------+------------+----------+------------+---------+

1 个答案:

答案 0 :(得分:3)

您可以concat datetime将其转换为timestamp并在几分钟内找到difference,如下所示

import org.apache.spark.sql.functions._
val format = "yyyy-MM-dd HH:mm:ss.SSS" //datetime format after concat

val newDF = df1.withColumn("TimeDiffInMinute",
  abs(unix_timestamp(concat_ws(" ", $"date1", $"time1"), format).cast("long")
  - (unix_timestamp(concat_ws(" ", $"date2", $"time2"), format)).cast("long") / 60D
)

unix_timestamp要将datetime转换为timestamp,减去timestamp会产生seconds并除以60会产生minutes。< / p>

输出:

+-----+----+----------+------------+----------+------------+---------+
|empId| lId|     date1|       time1|     date2|       time2|dateTime1|
+-----+----+----------+------------+----------+------------+---------+
| 1234|1212|2018-04-20|21:40:29.077|2018-04-20|22:40:29.077|     60.0|
| 1235|1212|2018-04-20|22:40:29.077|2018-04-21|00:40:29.077|    120.0|
+-----+----+----------+------------+----------+------------+---------+

希望这有帮助!