将字符串列转换为特定格式的日期

时间:2020-01-14 12:36:04

标签: scala date apache-spark

如何将字符串列转换为日期列并在spark数据框中保持相同的格式?

我想通过指定格式将字符串列转换为日期,但是转换日期始终采用默认格式yyyy-MM-dd。

但是我希望日期类型的格式与字符串值相同(我希望数据类型仅作为日期而不是字符串)

例如:

 val spark = SparkSession.builder().master("local").appName("appName").getOrCreate()
 import spark.implicits._
 //here the format is MMddyyyy(For Col2 which is of String type here)
 val df = List(("1","01132019"),("2","01142019")).toDF("Col1","Col2")

 import org.apache.spark.sql.functions._

 //Here i need the Col3 in Date type and with the format MMddyyyy But it is converting into yyyy-MM-dd
  val df1 = df.withColumn("Col3",to_date($"Col2","MMddyyyy"))

 //I tried this but this will give me Col3 in String data type which i need in Date
  val df1 = df.withColumn("Col3",date_format(to_date($"Col2","MMddyyyy"),"MMddyyyy"))

1 个答案:

答案 0 :(得分:0)

这是不可能的,Spark仅接受日期类型为 yyyy-MM-dd 的格式。

如果您需要MMddyyyy格式日期字段,然后存储为 String 类型(如果我们强制转换为日期类型,则结果为null),在处理时更改格式和转换为date类型。

例如:

df.withColumn("Col3",$"col2".cast("date")) //casting col2 as date datatype Results null
  .withColumn("col4",to_date($"col2","MMddyyyy").cast("date")) //changing format and casting as date type
  .show(false)

结果:

+----+--------+----+----------+
|Col1|    Col2|Col3|      col4|
+----+--------+----+----------+
|   1|01132019|null|2019-01-13|
|   2|01142019|null|2019-01-14|
+----+--------+----+----------+

Schema:

df.withColumn("Col3",$"col2".cast("date"))
  .withColumn("col4",to_date($"col2","MMddyyyy").cast("date"))
  .printSchema

结果:

root
 |-- Col1: string (nullable = true)
 |-- Col2: string (nullable = true)
 |-- Col3: date (nullable = true)
 |-- col4: date (nullable = true)
相关问题