Spark Dataframe UDF - 不支持类型为Any的架构

时间:2017-09-18 08:15:34

标签: scala apache-spark spark-dataframe user-defined-functions

我正在编写Spark Scala UDF并面临" java.lang.UnsupportedOperationException:不支持类型为Any的架构"

import org.apache.spark.sql.expressions.UserDefinedFunction
import org.apache.spark.sql.functions.udf

val aBP = udf((bG: String, pS: String, bP: String, iOne: String, iTwo: String) => {
  if (bG != "I") {"NA"}
  else if (pS == "D")
    {if (iTwo != null) iOne else "NA"}
  else if (pS == "U")
    {if (bP != null) bP else "NA"}
})

抛出错误" java.lang.UnsupportedOperationException:不支持类型为Any的模式"

1 个答案:

答案 0 :(得分:5)

正如this link中你的udf应该回复:

  • Primitives(Int,String,Boolean,...)
  • 其他支持类型的元组
  • 列表,数组,其他支持类型的地图
  • 其他受支持类型的案例类

因此,如果您在代码中添加其他内容,则编译将成功。

  val aBP = udf((bG: String, pS: String, bP: String, iOne: String, iTwo: String) => {
    if (bG != "I") {"NA"}
    else if (pS == "D") {
      if (iTwo != null) 
        iOne 
      else "NA"
    } else if (pS == "U") {
      if (bP != null) 
        bP 
      else 
        "NA"
    } else {
      ""
    }
  })

您还可以使用模式匹配重新分发代码:

val aBP = udf [String, String, String, String, String, String] {
  case (bG: String, _, _, _, _)                       if bG != "I" => "NA"
  case (_, pS: String, _, iOne: String, iTwo: String) if pS == "D" && iTwo.isEmpty => iOne
  case (_, pS: String, _, _, _)                       if pS == "D" => "NA"
  case (_, pS: String, bP: String, _, _)              if pS == "U" && bP.isEmpty => bP
  case (_, pS: String, _, _, _)                       if pS == "U" => "NA"
  case _ => ""
}