将ArrayType(DoubleType,true)转换为DoubleType
val training = spark.read.parquet("/usr/local/spark/dataset/data/user")
val df = training.selectExpr("cast(id as int) id","cast(features as double) features")
val assembler = new VectorAssembler().setInputCols(Array("features" )).setOutputCol("feature")
val data = assembler.transform(df)
此错误出现
cannot resolve 'CAST(`features` AS DOUBLE)' due to data type mismatch: cannot cast ArrayType(DoubleType,true) to DoubleType; line 1 pos 0;
如何解决?
出现编辑错误后
java.lang.ClassCastException:scala.collection.mutable.WrappedArray $ ofRef无法转换为[D
答案 0 :(得分:0)
“功能”列包含DoubleType数组,因此无法将其强制转换为DoubleType。您可以使用Vectors.dense
将此列转换为Vector。然后在包含double和vector的列上使用VectorAssembler
。
类似的东西
val training = spark.read.parquet("/usr/local/spark/dataset/data/user")
val df = training.map{ r =>
(Vectors.dense(r.getAs[Array[Double]]("features")),r.getAs[Double]("id"))
}.toDF("features","id")
val assembler = new VectorAssembler().setInputCols(Array("features" )).setOutputCol("feature")
val data = assembler.transform(df)