如何在Spark中将Mahout VectorWritable转换为Vector

时间:2015-07-31 00:05:58

标签: scala apache-spark mahout apache-spark-mllib

我有一个2015-07-17 01:45:46WHERE date > ".date("Y-m-d H:i:s")." - INTERVAL 2 DAY)来自Mahout生成的序列文件,我希望将其转换为Vector(VectorWritable)类型为Spark。我怎么能在Scala中做到这一点?

1 个答案:

答案 0 :(得分:1)

假设我们previous question中有import scala.collection.JavaConverters.iterableAsScalaIterableConverter def mahoutToScala(v: org.apache.mahout.math.VectorWritable) = { val scalaArray = v.get.all.asScala.map(_.get).toArray org.apache.spark.mllib.linalg.Vectors.dense(scalaArray) } rdd.map{ case (k, v) => (k.toString, mahoutToScala(v))}

biscect