在Pyspark中将字符串时间戳转换为日期

时间:2019-06-16 03:00:40

标签: date datetime pyspark apache-spark-sql timestamp

我有一列时间戳记作为String。我想将它们转换为“ yyyy-MM-dd”格式的日期

+-------------------+                                                           
|           date_col|
+-------------------+
|2019-01-01 08:01:45|
|2019-01-02 17:17:25|
|2019-01-03 15:01:45|
+-------------------+

我希望将'2019-01-01','2019-01-02','2019-01-03'作为输出

1 个答案:

答案 0 :(得分:0)

使用子字符串和截止日期:

from pyspark.sql import Row
from pyspark.sql.functions import to_date, substring, col
df = sc.parallelize([Row(date_col="2019-01-01 08:01:45"),Row(date_col="2019-01-02 17:17:25"),Row(date_col="2019-01-03 15:01:45")]).toDF()

df = df.withColumn("new_date", to_date(substring(col("date_col"),0,10), "yyyy-MM-dd"))

df.show()
+-------------------+----------+
|           date_col|  new_date|
+-------------------+----------+
|2019-01-01 08:01:45|2019-01-01|
|2019-01-02 17:17:25|2019-01-02|
|2019-01-03 15:01:45|2019-01-03|
+-------------------+----------+