转换错误:将ArrayType(DoubleType,true)转换为DoubleType

时间:2020-01-22 22:53:56

标签: scala apache-spark

我有包含id功能的镶木地板文件。当我使用强制转换错误时 enter image description here

将ArrayType(DoubleType,true)转换为DoubleType

val training = spark.read.parquet("/usr/local/spark/dataset/data/user")
 val df = training.selectExpr("cast(id as int) id","cast(features as double) features")
 val assembler = new VectorAssembler().setInputCols(Array("features" )).setOutputCol("feature")
 val data = assembler.transform(df)

此错误出现

cannot resolve 'CAST(`features` AS DOUBLE)' due to data type mismatch: cannot cast ArrayType(DoubleType,true) to DoubleType; line 1 pos 0;

如何解决?

出现编辑错误后

java.lang.ClassCastException:scala.collection.mutable.WrappedArray $ ofRef无法转换为[D

1 个答案:

答案 0 :(得分:0)

“功能”列包含DoubleType数组,因此无法将其强制转换为DoubleType。您可以使用Vectors.dense将此列转换为Vector。然后在包含double和vector的列上使用VectorAssembler

类似的东西

val training = spark.read.parquet("/usr/local/spark/dataset/data/user")
 val df = training.map{ r =>
   (Vectors.dense(r.getAs[Array[Double]]("features")),r.getAs[Double]("id"))
  }.toDF("features","id")
 val assembler = new VectorAssembler().setInputCols(Array("features" )).setOutputCol("feature")
 val data = assembler.transform(df)