我在列中有各种类型的DataFrame
。为了清楚起见,我们假设它的结构如下所示,列为Ints
,列为Strings
,列为Floats
。
+-------+-------+-------+
|column1|column2|column3|
+-------+-------+-------+
| 1| a| 0.1|
| 2| b| 0.2|
| 3| c| 0.3|
+-------+-------+-------+
我试图将UDF应用于所有这三列,以便将每个条目更改为案例类,如下所示:
case class Annotation(lastUpdate: String, value: Any)
通过应用以下代码:
val columns = df.columns
val myUDF= udf { in: Any => Annotation("dummy", in) }
val finalDF = columns.foldLeft(df){ (tempDF, colName) =>
tempDF.withColumn(colName, myUDF(col(colName)))
}
请注意,在第一遍中,我不关心Annotation.lastUpdate
值是什么。但是,在尝试运行时,我收到以下错误:
Exception in thread "main" java.lang.UnsupportedOperationException: Schema for type scala.Any is not supported
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:762)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:704)
at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:56)
at org.apache.spark.sql.catalyst.ScalaReflection$class.cleanUpReflectionObjects(ScalaReflection.scala:809)
at org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:39)
at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:703)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1$$anonfun$apply$6.apply(ScalaReflection.scala:758)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1$$anonfun$apply$6.apply(ScalaReflection.scala:757)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
我一直在寻找解决此问题的自定义编码器,但我不确定如何在这种情况下应用一个。