Spark Scala非常规日期转换中的两个类似udf产生不同的结果

时间:2019-03-26 07:03:15

标签: java scala apache-spark user-defined-functions simpledateformat

我有一个简单的Spark df:

val df = Seq("24-12-2017","25-01-2016").toDF("dates")
df.show()
+----------+
|     dates|
+----------+
|24-12-2017|
|25-01-2016|
+----------+

要将其转换为所需格式,请使用以下代码段:

import java.text.SimpleDateFormat

def fmt(d:String) = {
    val f = new SimpleDateFormat("dd-MM-yyyy").parse(d).getTime
    new java.sql.Timestamp(f)
}

val fmtTimestamp = udf(fmt(_:String):java.sql.Timestamp)

df.select($"dates",fmtTimestamp($"dates")).show
+----------+-------------------+
|     dates|         UDF(dates)|
+----------+-------------------+
|24-12-2017|2017-12-24 00:00:00|
|25-01-2016|2016-01-25 00:00:00|
+----------+-------------------+

并且外来活动按预期进行。

但是,当我尝试简化版本时,everthing被压碎了:

import java.text.SimpleDateFormat

def fmt(d:String) = {
    new SimpleDateFormat("dd-MM-yyyy").parse(d)
}

val fmtTimestamp = udf(fmt(_:String):java.util.Date)

java.lang.UnsupportedOperationException: Schema for type java.util.Date is not supported
  org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:789)
  org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:724)
  scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:56)
  org.apache.spark.sql.catalyst.ScalaReflection$class.cleanUpReflectionObjects(ScalaReflection.scala:906)
  org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:46)
  org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:723)
  org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:720)
  org.apache.spark.sql.functions$.udf(functions.scala:3898)
  $sess.cmd17Wrapper$Helper.<init>(cmd17.sc:7)
  $sess.cmd17Wrapper.<init>(cmd17.sc:718)
  $sess.cmd17$.<init>(cmd17.sc:563)
  $sess.cmd17$.<clinit>(cmd17.sc:-1)

第一案完成和第二次粉碎的原因可能是什么?

0 个答案:

没有答案