据我所知,Vector和DataFrames应该在Spark 2.0+中发挥出色。我有一个非常简单的案例和奇怪的错误。我有一个带有SparseVectors列的DataFrame,我无法应用任何矢量方法。错误说它想要一个向量作为输入,但是,我给了它一个向量。知道这里出了什么问题吗?
import org.apache.spark.sql.functions._
import org.apache.spark.mllib.linalg.{DenseVector, SparseVector, Vector}
// This is my dataFrame. One column, it is a sparse vector
vecCol
org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [normFeatures: vector]
//It doesn't matter what vector method I use, it fails no matter what
val vecToArray = udf((v: Vector) => v.toArray)
vecCol.withColumn("withArray",vecToArray($"normFeatures"))
org.apache.spark.sql.AnalysisException: cannot resolve 'UDF(normFeatures)' due to data type mismatch: argument 1 requires vector type, however, '`normFeatures`' is of vector type.;