我正在尝试从SparseVectors的RDD创建RowMatrix,但是收到以下错误:
<console>:37: error: type mismatch;
found : dataRows.type (with underlying type org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.SparseVector])
required: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector]
Note: org.apache.spark.mllib.linalg.SparseVector <: org.apache.spark.mllib.linalg.Vector (and dataRows.type <: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.SparseVector]), but class RDD is invariant in type T.
You may wish to define T as +T instead. (SLS 4.5)
val svd = new RowMatrix(dataRows.persist()).computeSVD(20, computeU = true)
我的代码是:
import org.apache.spark.mllib.linalg.distributed.RowMatrix
import org.apache.spark.mllib.linalg._
import org.apache.spark.{SparkConf, SparkContext}
val DATA_FILE_DIR = "/user/cloudera/data/"
val DATA_FILE_NAME = "dataOct.txt"
val dataRows = sc.textFile(DATA_FILE_DIR.concat(DATA_FILE_NAME)).map(line => Vectors.dense(line.split(" ").map(_.toDouble)).toSparse)
val svd = new RowMatrix(dataRows.persist()).computeSVD(20, computeU = true)
我的输入数据文件大约是150行乘以50,000列空格分隔的整数。
我正在跑步:
Spark: Version 1.5.0-cdh5.5.1
Java: 1.7.0_67
答案 0 :(得分:1)
只为RDD
val dataRows: org.apache.spark.rdd.RDD[Vector] = ???
或匿名函数的结果:
...
.map(line => Vectors.dense(line.split(" ").map(_.toDouble)).toSparse: Vector)