我有一个简单的Spark df:
val df = Seq("24-12-2017","25-01-2016").toDF("dates")
df.show()
+----------+
| dates|
+----------+
|24-12-2017|
|25-01-2016|
+----------+
要将其转换为所需格式,请使用以下代码段:
import java.text.SimpleDateFormat
def fmt(d:String) = {
val f = new SimpleDateFormat("dd-MM-yyyy").parse(d).getTime
new java.sql.Timestamp(f)
}
val fmtTimestamp = udf(fmt(_:String):java.sql.Timestamp)
df.select($"dates",fmtTimestamp($"dates")).show
+----------+-------------------+
| dates| UDF(dates)|
+----------+-------------------+
|24-12-2017|2017-12-24 00:00:00|
|25-01-2016|2016-01-25 00:00:00|
+----------+-------------------+
并且外来活动按预期进行。
但是,当我尝试简化版本时,everthing被压碎了:
import java.text.SimpleDateFormat
def fmt(d:String) = {
new SimpleDateFormat("dd-MM-yyyy").parse(d)
}
val fmtTimestamp = udf(fmt(_:String):java.util.Date)
java.lang.UnsupportedOperationException: Schema for type java.util.Date is not supported
org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:789)
org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:724)
scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:56)
org.apache.spark.sql.catalyst.ScalaReflection$class.cleanUpReflectionObjects(ScalaReflection.scala:906)
org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:46)
org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:723)
org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:720)
org.apache.spark.sql.functions$.udf(functions.scala:3898)
$sess.cmd17Wrapper$Helper.<init>(cmd17.sc:7)
$sess.cmd17Wrapper.<init>(cmd17.sc:718)
$sess.cmd17$.<init>(cmd17.sc:563)
$sess.cmd17$.<clinit>(cmd17.sc:-1)
第一案完成和第二次粉碎的原因可能是什么?