我尝试将字符串类型从字符串更改为日期。我咨询过:
当我尝试从链接1应用答案时,我得到了null结果,所以我提到了链接2的答案,但我不理解这一部分:
output_format = ... # Some SimpleDateFormat string
我想直接从评论中提问,但唉,我的声誉还不够。
答案 0 :(得分:7)
希望这有帮助!
from pyspark.sql.functions import col, unix_timestamp, to_date
#sample data
df = sc.parallelize([['12-21-2006'],
['05-30-2007'],
['01-01-1984'],
['12-24-2017']]).toDF(["date_in_strFormat"])
df.printSchema()
df = df.withColumn('date_in_dateFormat',
to_date(unix_timestamp(col('date_in_strFormat'), 'MM-dd-yyyy').cast("timestamp")))
df.show()
df.printSchema()
输出是:
root
|-- date_in_strFormat: string (nullable = true)
+-----------------+------------------+
|date_in_strFormat|date_in_dateFormat|
+-----------------+------------------+
| 12-21-2006| 2006-12-21|
| 05-30-2007| 2007-05-30|
| 01-01-1984| 1984-01-01|
| 12-24-2017| 2017-12-24|
+-----------------+------------------+
root
|-- date_in_strFormat: string (nullable = true)
|-- date_in_dateFormat: date (nullable = true)
答案 1 :(得分:3)
简单方法:
from pyspark.sql.types import *
df_1 = df.withColumn("col_with_date_format",
df["col_with_date_format"].cast(DateType()))
答案 2 :(得分:0)
使用默认的to_date函数,这是一种更简单的方法:
from pyspark.sql import functions as F
df= df.withColumn('col_with_date_format',F.to_date(df.col_with_str_format))