Question

我的数据位于RDD[LabeledPoint]（在下面的代码中表示为sparse0.sparseData）

我想转换为RDD[(Long,Vector)]所以我可以在mllib包中运行LDA分析。

我能管理的最好的是RDD[(Long,Vector[Double])]的地图，在输入LDA.run方法时无法编译

尝试映射到RDD[(Long,Vector)]无法在.map方法中进行编译。（错误向量采用类型参数）

我的地图方法似乎曲折的事实表明我错过了一些明显的东西。任何提示将不胜感激

val mappedData:Map[Long,Vector[Double]]=sparse0.sparseData().collect().map
{
      var count:Int=0
  row =>
    count=count+1
    new Tuple2[Long,Vector[Double]](count,row.features.toArray.toVector)



}.toMap

val mappedRDD=spark.sparkContext.parallelize(mappedData.toSeq)

// Cluster the documents into three topics using LDA
val ldaModel = new LDA().setK(3).run(mappedRDD)

Answer 1

Scala Vector与mllib.linalg.Vector不同。我zipWithIndex

val mappedRDD  = sparse0.sparseData().map(_.features).zipWithIndex.map(_.swap)

Scala：将RDD [LabelledPoint]转换为RDD [（Long，Vector）]

1 个答案: