在SparkSQL中定义UDT时,我制作了这样的UDT
class trajUDT extends UserDefinedType[traj] {
override def sqlType: DataType = StructType(Seq(
StructField("id", DataTypes.StringType),
StructField("loc", ArrayType(StructType(Seq(
StructField("x",DataTypes.DoubleType),
StructField("y",DataTypes.DoubleType)
))))
))
...
}
traj是一个类
class traj(val id:UTF8String,val loc:Array[Tuple2[Double,Double]] )
我想写一个这样的序列化函数
override def serialize(p: traj): GenericInternalRow = {
new GenericInternalRow(Array[Any](p.id,p.loc.map(x=>Array(x._1,x._2)))
}
但是它失败了,因为它告诉我不能将其转换为ArrayData。
我还编写了这样的反序列化函数:
override def deserialize(datum: Any): traj = {
val arr=datum.asInstanceOf[InternalRow]
val id = arr.getUTF8String(0)
val xytype=StructType(Seq(
StructField("x",DataTypes.DoubleType),
StructField("y",DataTypes.DoubleType)
))
val xy = arr.getArray(1)
val xye =xy.toArray[Tuple2[Double,Double]](xytype)
new traj(id,xye)
}
我想这也行不通...
那么有人可以教我如何进行这两个转换吗?
答案 0 :(得分:0)
与InternalRow
一起工作时,我遇到了类似的问题
用InternalRow
或Array
构造Seq
会导致 java.lang.ClassCastException 。
import org.apache.spark.sql.catalyst.InternalRow
val row = InternalRow(Array(1, 2, 3), 1L)
println(s"Row first element: ${row.getArray(0).toIntArray.toVector}")
println(s"Row second element: ${row.getLong(1)}")
java.lang.ClassCastException: [I cannot be cast to org.apache.spark.sql.catalyst.util.ArrayData
at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getArray(rows.scala:48)
at org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getArray(rows.scala:195)
我通过传递ArrayData
字段而不是Array
或Seq
来解决此问题。我使用了ArrayData.toArrayData
方法,如下所示:
import org.apache.spark.sql.catalyst.InternalRow
import org.apache.spark.sql.catalyst.util.ArrayData
val row = InternalRow(ArrayData.toArrayData(Array(1, 2, 3)), 1L)
println(s"Row first element: ${row.getArray(0).toIntArray.toVector}")
println(s"Row second element: ${row.getLong(1)}")
Row first element: Vector(1, 2, 3)
Row second element: 1