如何在Spark中使用Timestamp / Date类型的参数创建UDF

时间:2018-08-22 07:02:04

标签: apache-spark apache-spark-sql

我正在尝试使用以下代码在Spark 2.2中创建UDF:

spark.udf.register(
"DAYOFWEEK",
(timestamp: java.sql.Timestamp) => {
  new Timestamp()
  val cal = Calendar.getInstance()
  cal.setTime(timestamp)
  cal.get(Calendar.DAY_OF_WEEK)
}

稍后,当启动下一个SQL查询时:

SELECT DAYOFWEEK(now())

出现下一个异常:

cannot resolve 'UDF:DAYOFWEEK(current_timestamp())' due to data type mismatch: argument 1 requires bigint type, however, 'current_timestamp()' is of timestamp type.; line 1 pos 7;

我在做什么错了?

2 个答案:

答案 0 :(得分:0)

scala> sqlContext.udf.register(
     | "DAYOFWEEK",
     | (timestamp: java.sql.Timestamp) => {
     |   val cal = Calendar.getInstance()
     |   cal.setTime(timestamp)
     |   cal.get(Calendar.DAY_OF_WEEK)
     | });
res16: org.apache.spark.sql.UserDefinedFunction = UserDefinedFunction(<function1>,IntegerType,List(TimestampType))

scala> 

scala> val dd = sqlContext.sql("select DAYOFWEEK(now())")
dd: org.apache.spark.sql.DataFrame = [_c0: int]


scala> dd.show
+---+
|_c0|
+---+
|  4|
+---+

答案 1 :(得分:0)

@Constantine感谢您的建议。问题在于,已经有一个UDF注册了相同的名称,但以日期作为参数:

udf.register(
  "DAYOFWEEK",
  (date: Date) => {
    val cal = Calendar.getInstance()
    cal.setTime(date)
    cal.get(Calendar.DAY_OF_WEEK)
  }
)

一旦会话仅注册了一个 DAYOFWEEK UDF,它就会按预期工作