如何获取scala中两个java.sql.timestamp字段之间的天数

时间:2017-07-04 18:01:40

标签: java scala apache-spark apache-spark-sql sql-timestamp

我的数据框中有两个java.sql.timestamp类型的字段,我想查找这两列之间的天数

以下是我的数据格式:* 2016-12-23 23:56:02.0(yyyy-MM-dd HH:mm:ss.S)

我尝试了很多方法,但没有找到任何解决方案。所以任何人都可以帮忙。

1 个答案:

答案 0 :(得分:0)

org.apache.spark.sql.functions是一个宝库。例如,datediff方法可以完全按照您的要求执行:here is the ScalaDoc.

一个例子:

val spark: SparkSession = ??? // your spark session
val sc: SparkContext = ??? // your spark context

import spark.implicits._ // to better work with spark sql

import java.sql.Timestamp

final case class Data(id: Int, from: Timestamp, to: Timestamp)

val ds =
  spark.createDataset(sc.parallelize(Seq(
    Data(1, Timestamp.valueOf("2017-01-01 00:00:00"), Timestamp.valueOf("2017-01-11 00:00:00")),
    Data(2, Timestamp.valueOf("2017-01-01 00:00:00"), Timestamp.valueOf("2017-01-21 00:00:00")),
    Data(3, Timestamp.valueOf("2017-01-01 00:00:00"), Timestamp.valueOf("2017-01-23 00:00:00")),
    Data(4, Timestamp.valueOf("2017-01-01 00:00:00"), Timestamp.valueOf("2017-01-07 00:00:00"))
  )))

import org.apache.spark.sql.functions._

ds.select($"id", datediff($"from", $"to")).show()

通过运行此代码段,您最终会得到以下输出:

+---+------------------+
| id|datediff(from, to)|
+---+------------------+
|  1|               -10|
|  2|               -20|
|  3|               -22|
|  4|                -6|
+---+------------------+