需要计算两个日期之间的差异。问题是
Currentdate - max(day_id)
“Currentdate”具有简单的日期格式 - yyyyMMdd
“day_id”是字符串格式,其值为yyyy-mm-dd
。
我有一个数据框,它将日期(字符串格式)转换为日期格式(yyyy-mm-dd)
df1 = to_date(from_unixtime(unix_timestamp(day_id, 'yyyy-MM-dd')))
通常,为了找到max(day_id),我会做
def daySince (columnName: String): Column = {
max(col(columnName))
我无法弄清楚如何区分
Currentdate - max(day_id)
答案 0 :(得分:2)
将带有架构的输入数据框作为
// ** notional code - does not compile **
def parse[T](args: Seq[String], klass: Class[T]): T = {
val expectedTypes = klass.getDeclaredFields.map(_.getGenericType)
val typedArgs = args.zip(expectedTypes).map({
case (arg, String) => arg
case (arg, Int) => arg.toInt
case (arg, unknownType) =>
throw new RuntimeException(s"Unsupported type $unknownType")
})
(klass.getConstructor(typedArgs).newInstance _).tupled(typedArgs)
}
您可以使用+---+----------+
|id |day_id |
+---+----------+
|id1|2017-11-21|
|id1|2018-01-21|
|id2|2017-12-21|
+---+----------+
root
|-- id: string (nullable = true)
|-- day_id: string (nullable = true)
和current_date()
内置功能来满足您的要求
datediff()
应该给你
import org.apache.spark.sql.functions._
df.withColumn("diff", datediff(current_date(), to_date(col("day_id"), "yyyy-MM-dd")))