我的数据框中有两个java.sql.timestamp
类型的字段,我想查找这两列之间的天数
以下是我的数据格式:* 2016-12-23 23:56:02.0(yyyy-MM-dd HH:mm:ss.S)
我尝试了很多方法,但没有找到任何解决方案。所以任何人都可以帮忙。
答案 0 :(得分:0)
org.apache.spark.sql.functions
是一个宝库。例如,datediff
方法可以完全按照您的要求执行:here is the ScalaDoc.
一个例子:
val spark: SparkSession = ??? // your spark session
val sc: SparkContext = ??? // your spark context
import spark.implicits._ // to better work with spark sql
import java.sql.Timestamp
final case class Data(id: Int, from: Timestamp, to: Timestamp)
val ds =
spark.createDataset(sc.parallelize(Seq(
Data(1, Timestamp.valueOf("2017-01-01 00:00:00"), Timestamp.valueOf("2017-01-11 00:00:00")),
Data(2, Timestamp.valueOf("2017-01-01 00:00:00"), Timestamp.valueOf("2017-01-21 00:00:00")),
Data(3, Timestamp.valueOf("2017-01-01 00:00:00"), Timestamp.valueOf("2017-01-23 00:00:00")),
Data(4, Timestamp.valueOf("2017-01-01 00:00:00"), Timestamp.valueOf("2017-01-07 00:00:00"))
)))
import org.apache.spark.sql.functions._
ds.select($"id", datediff($"from", $"to")).show()
通过运行此代码段,您最终会得到以下输出:
+---+------------------+
| id|datediff(from, to)|
+---+------------------+
| 1| -10|
| 2| -20|
| 3| -22|
| 4| -6|
+---+------------------+