通过UDF使用DataFrames的Spark 2.0 mllib.linalg.SparseVector方法?

时间:2016-12-14 18:25:50

标签: scala apache-spark spark-dataframe

据我所知,Vector和DataFrames应该在Spark 2.0+中发挥出色。我有一个非常简单的案例和奇怪的错误。我有一个带有SparseVectors列的DataFrame,我无法应用任何矢量方法。错误说它想要一个向量作为输入,但是,我给了它一个向量。知道这里出了什么问题吗?

import org.apache.spark.sql.functions._
import org.apache.spark.mllib.linalg.{DenseVector, SparseVector, Vector}

// This is my dataFrame. One column, it is a sparse vector 
vecCol
org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [normFeatures: vector]

//It doesn't matter what vector method I use, it fails no matter what
val vecToArray = udf((v: Vector) => v.toArray)
vecCol.withColumn("withArray",vecToArray($"normFeatures"))
org.apache.spark.sql.AnalysisException: cannot resolve 'UDF(normFeatures)' due to data type mismatch: argument 1 requires vector type, however, '`normFeatures`' is of vector type.;

0 个答案:

没有答案