我有一个数据框,应用from_unixtime似乎暴露出异常:
scala> val bhDF4 = bhDF.withColumn("ts1", $"ts" + 28800).withColumn("ts2", from_unixtime($"ts" + 28800,"YYYYMMddhhmmss"))
bhDF4: org.apache.spark.sql.DataFrame = [user_id: int, item_id: int ... 5 more fields]
scala> bhDF4.show
+-------+-------+-------+--------+----------+----------+--------------+
|user_id|item_id| cat_id|behavior| ts| ts1| ts2|
+-------+-------+-------+--------+----------+----------+--------------+
| 1|2268318|2520377| pv|1511544070|1511572870|20171124082110|
| 1|2333346|2520771| pv|1511561733|1511590533|20171125011533|
| 1|2576651| 149192| pv|1511572885|1511601685|20171125042125|
| 1|3830808|4181361| pv|1511593493|1511622293|20171125100453|
| 1|4365585|2520377| pv|1511596146|1511624946|20171125104906|
| 1|4606018|2735466| pv|1511616481|1511645281|20171125042801|
| 1| 230380| 411153| pv|1511644942|1511673742|20171126122222|
| 1|3827899|2920476| pv|1511713473|1511742273|20171126072433|
| 1|3745169|2891509| pv|1511725471|1511754271|20171126104431|
| 1|1531036|2920476| pv|1511733732|1511762532|20171127010212|
| 1|2266567|4145813| pv|1511741471|1511770271|20171127031111|
| 1|2951368|1080785| pv|1511750828|1511779628|20171127054708|
| 1|3108797|2355072| pv|1511758881|1511787681|20171127080121|
| 1|1338525| 149192| pv|1511773214|1511802014|20171127120014|
| 1|2286574|2465336| pv|1511797167|1511825967|20171127063927|
| 1|5002615|2520377| pv|1511839385|1511868185|20171128062305|
| 1|2734026|4145813| pv|1511842184|1511870984|20171128070944|
| 1|5002615|2520377| pv|1511844273|1511873073|20171128074433|
| 1|3239041|2355072| pv|1511855664|1511884464|20171128105424|
| 1|4615417|4145813| pv|1511870864|1511899664|20171128030744|
+-------+-------+-------+--------+----------+----------+--------------+
only showing top 20 rows------------------------------------------------
所有ts2都应显示 20171125之后的日期,而似乎至少有一个异常显示 20171124
>
为了验证,我针对该异常运行了不同的代码:
print(LocalDateTime.ofEpochSecond(1511572870, 0, ZoneOffset.UTC).format(DateTimeFormatter.ofPattern("yyyyMMddhhmmss")))
它确实返回正确的日期。
有什么想法吗?