Question

首先，感谢您抽出时间阅读我的问题：）

我的问题如下：在Spark with Scala中，我有一个数据框，其中包含一个日期格式为dd / MM / yyyy HH：mm的字符串，例如df

+----------------+
|date            |
+----------------+
|8/11/2017 15:00 |
|9/11/2017 10:00 |
+----------------+

我希望得到currentDate与第二个数据帧日期的区别，例如

df.withColumn（＆＃34;差异＆＃34;，currentDate - unix_timestamp（col（date）））

+----------------+------------+
|date            | difference |
+----------------+------------+
|8/11/2017 15:00 | xxxxxxxxxx |
|9/11/2017 10:00 | xxxxxxxxxx |
+----------------+------------+

我试试

val current = current_timestamp()
df.withColumn("difference", current - unix_timestamp(col(date)))

但是会收到此错误

org.apache.spark.sql.AnalysisException：无法解析＆＃39;（current_timestamp（） - unix_timestamp（date，＆＃39; yyyy-MM-dd HH：mm：ss＆＃39;））＆＃39;由于数据类型不匹配：＆＃39;中的不同类型（current_timestamp（） - unix_timestamp（date，＆＃39; yyyy-MM-dd HH：mm：ss＆＃39;））＆＃39; （timestamp和bigint）。;;

我也试过

val current = BigInt(System.currenttimeMillis / 1000)
df.withColumn("difference", current - unix_timestamp(col(date)))

和

val current = unix_timestamp(current_timestamp())
but the col "difference" is null

由于

Answer 1

您必须使用unix_timestamp的正确格式：

df.withColumn("difference", current_timestamp().cast("long") - unix_timestamp(col("date"), "dd/mm/yyyy HH:mm"))

或使用最新版本：

to_timestamp(col("date"), "dd/mm/yyyy HH:mm") - current_timestamp())

获取Interval列。

使用dataframe scala中的另一个日期减去当前日期

1 个答案: