我有一个包含Date列的DF。我执行以下操作以提取其中之一:
df.agg(min(substring($"nom_fic", 17, 10))).first.get(0) // gives a variable whith type Any
如何将其转换为日期类型?我尝试过:
dtmin = df.agg(min(substring($"nom_fic", 17, 10))).first.get(0).asInstanceOf[Date]
并返回:
java.lang.ClassCastException: java.lang.String cannot be cast to java.util.Date
谢谢!
答案 0 :(得分:1)
使用Spark> = 2.2 to_timestamp可以如下所示使用
import org.apache.spark.sql.functions.to_timestamp
scala> df.show(10)
+-------------------+
| dts|
+-------------------+
|11/26/2019 01:01:01|
|11/20/2019 01:01:01|
+-------------------+
val new_df = df.withColumn("ts", ts).show(2, false)
scala> new_df.show(10)
+-------------------+-------------------+
| dts| ts|
+-------------------+-------------------+
|11/26/2019 01:01:01|2019-11-26 01:01:01|
|11/20/2019 01:01:01|2019-11-20 01:01:01|
+-------------------+-------------------+
scala> val min_val = new_df.agg(min("ts")).first.get(0)
min_val: Any = 2019-11-20 01:01:01.0
scala> val max_val = new_df.agg(max("ts")).first.get(0)
max_val: Any = 2019-11-26 01:01:01.0