如何将ML稀疏矢量类型的变量转换为MLlib稀疏矢量类型?

时间:2016-11-11 20:08:42

标签: scala apache-spark machine-learning

当我尝试从Vector Transformer的输出创建标记点时,我遇到了以下问题:

  val realout = output.select("label","features").rdd.map(row => LabeledPoint
   row.getAs[Double]("label"),
row.getAs[org.apache.spark.mllib.linalg.SparseVector]("features")
))

我得到的错误是:

enter [error] (run-main-0) org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 13.0 failed 1 times, most recent failure: Lost task 0.0 in stage 13.0 (TID 13, localhost): java.lang.ClassCastException: org.apache.spark.ml.linalg.SparseVector cannot be cast to org.apache.spark.mllib.linalg.Vector
[error]     at DataCleaning$$anonfun$1.apply(DataCleaning.scala:107
[error]     at DataCleaning$$anonfun$1.apply(DataCleaning.scala:105)
[error] 
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
[error]
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462
[error]
atorg.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:213)

我检查了链接1中提供的解决方案,该解决方案解释了spark 2.0.0中向量的转换,但面临如下所述的编译错误,

object linalg is not a member of package org.apache.spark.ml

请帮助。谢谢!

1 个答案:

答案 0 :(得分:2)

org.apache.spark.mllib.linalg.SparseVector中有一个静态方法可将新的linalg类型转换为名为spark.mllib的{​​{1}}类型。它可以用于将ML稀疏向量转换为MLlib稀疏向量。请记住,它只复制引用。

您可以按如下方式使用它:

fromML

参考Spark文档:https://spark.apache.org/docs/2.0.1/api/java/org/apache/spark/mllib/linalg/SparseVector.html

P.S。 - :这个文档直接指向Java,但我的示例代码是在Scala中。但是,它没有问题,因为Scala与Java兼容。这意味着你可以从另一种方法中调用任何一种语言的方法。