不支持Spark 2.1.0 UDF Schema类型

时间:2017-04-26 22:50:39

标签: scala apache-spark user-defined-functions

我正在使用一种名为Point的数据类型(x:Double,y:Double)。我试图使用列_c1和_c2作为Point()的输入,然后创建一个新的Point值列,如下所示

val toPoint = udf{(x: Double, y: Double) => Point(x,y)}

然后我调用函数:

val point = data.withColumn("Point", toPoint(watned("c1"),wanted("c2")))

但是,当我声明udf时,我收到以下错误:

java.lang.UnsupportedOperationException: Schema for type com.vividsolutions.jts.geom.Point is not supported
      at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:733)
      at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$2.apply(ScalaReflection.scala:729)
      at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$2.apply(ScalaReflection.scala:728)
      at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
      at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
      at scala.collection.immutable.List.foreach(List.scala:381)
      at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
      at scala.collection.immutable.List.map(List.scala:285)
      at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:728)
      at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:671)
      at org.apache.spark.sql.functions$.udf(functions.scala:3084)
      ... 48 elided

我已正确导入此数据类型,并且之前使用过很多次。现在我尝试将它包含在我的udf的Schema中,它无法识别它。包含除标准Int,String,Array等以外的类型的方法是什么...

我在Amazon EMR上使用Spark 2.1.0。

我在这里引用了一些相关的问题:

How to define schema for custom type in Spark SQL?

Spark UDF error - Schema for type Any is not supported

1 个答案:

答案 0 :(得分:0)

您应该将Point定义为案例类

case class Point(x: Double, y: Double)

或如果您愿意

case class MyPoint(x:Double,y:Double) extends com.vividsolutions.jts.geom.Point(x,y)

这种模式由Spark自动推断