我对火花很新。
当我使用add_months()
时,如果列是时间戳类型,则返回日期类型。如何保留hour:minute:seconds
格式?
df.where($"DEAL_ID" === "deal1" && $"POOL_ID" ==="pool_1")
.select("LVALID_DEAL_DATE","LAST_PROCESS_DATE")
.withColumn("test", add_months($"LAST_PROCESS_DATE", -3))
.show
输出
| LVALID_DEAL_DATE| LAST_PROCESS_DATE| test|
|2016-05-01 00:00:...|2016-08-01 19:38:...|2016-05-01|
答案 0 :(得分:0)
看起来add_months仅支持日期类型。如果传递了Timestamp类型,则仅返回Date部分。我使用unix_timestamp函数尝试下面的代码,它将HH:mm:ss转换为00。
df.withColumn("New Dates",unix_timestamp(add_months(df("Dates"),1)).cast("timestamp")).show
答案 1 :(得分:0)
在这里我们可以通过以下方式来实现技巧:首先将时间分量以毫秒为单位来回转换时间戳为unix_epoch,然后将其添加回使用add_months添加月份后得到的新日期
假设我们有一个如下所示的数据框df
df.show()
+---+
| id|
+---+
| 1|
| 2|
| 3|
| 4|
| 5|
+---+
import org.apache.spark.sql.types.DateType
val dfttimestamp = df.withColumn("StartDateTimeEpoch", lit(1573362092000L))
.withColumn("StartDateTimeStamp", to_utc_timestamp(to_timestamp(col("StartDateTimeEpoch")/1000), "UTC"))
.withColumn("StartDateTimeTruncated", unix_timestamp(col("StartDateTimeStamp").cast(DateType)) * 1000) //truncate time component by converting to Date
.withColumn("StartTimeMillisDiff", col("StartDateTimeEpoch") - col("StartDateTimeTruncated")) //get time component in millis
.withColumn("StartDate_NextYr", add_months(col("StartDateTimeStamp"),12)) //add 12 months to get next year, as Date column
.withColumn("StartDateTimeEpoch_NextYr", unix_timestamp(col("StartDate_NextYr")) * 1000 + col("StartTimeMillisDiff")) // conver Date to unix-timestamp and add the prevous calculated diff in millis
.withColumn("StartDateTimeStamp_NextYr", to_utc_timestamp(to_timestamp(col("StartDateTimeEpoch_NextYr")/1000), "UTC"))
dfttimestamp.show()
dfttimestamp.printSchema()
+---+------------------+-------------------+----------------------+-------------------+----------------+-------------------------+-------------------------+
| id|StartDateTimeEpoch| StartDateTimeStamp|StartDateTimeTruncated|StartTimeMillisDiff|StartDate_NextYr|StartDateTimeEpoch_NextYr|StartDateTimeStamp_NextYr|
+---+------------------+-------------------+----------------------+-------------------+----------------+-------------------------+-------------------------+
| 1| 1573362092000|2019-11-10 05:01:32| 1573344000000| 18092000| 2020-11-10| 1604984492000| 2020-11-10 05:01:32|
| 2| 1573362092000|2019-11-10 05:01:32| 1573344000000| 18092000| 2020-11-10| 1604984492000| 2020-11-10 05:01:32|
| 3| 1573362092000|2019-11-10 05:01:32| 1573344000000| 18092000| 2020-11-10| 1604984492000| 2020-11-10 05:01:32|
| 4| 1573362092000|2019-11-10 05:01:32| 1573344000000| 18092000| 2020-11-10| 1604984492000| 2020-11-10 05:01:32|
| 5| 1573362092000|2019-11-10 05:01:32| 1573344000000| 18092000| 2020-11-10| 1604984492000| 2020-11-10 05:01:32|
+---+------------------+-------------------+----------------------+-------------------+----------------+-------------------------+-------------------------+
root
|-- id: integer (nullable = false)
|-- StartDateTimeEpoch: long (nullable = false)
|-- StartDateTimeStamp: timestamp (nullable = true)
|-- StartDateTimeTruncated: long (nullable = true)
|-- StartTimeMillisDiff: long (nullable = true)
|-- StartDate_NextYr: date (nullable = true)
|-- StartDateTimeEpoch_NextYr: long (nullable = true)
|-- StartDateTimeStamp_NextYr: timestamp (nullable = true)