使用Spark v1.0-rc3 - 实现MLlib的线性回归时,我收到错误。所以我最终尝试从Spark's MLlib example code复制/粘贴Scala中的线性回归,但仍然收到错误:
scala> val parsedData = data.map { line =>
val parts = line.split(',')
LabeledPoint(parts(0).toDouble, parts(1).split(' ').map(x => x.toDouble).toArray)
}
<console>:28: error: polymorphic expression cannot be instantiated to expected type;
found : [U >: Double]Array[U]
required: org.apache.spark.mllib.linalg.Vector
LabeledPoint(parts(0).toDouble, parts(1).split(' ').map(x => x.toDouble).toArray)
错误表明需要org.apache.spark.mllib.linalg.Vector
,但导入它无济于事。即使尝试多种方法转换为Vector,我也可以
<console>:19: error: type mismatch;
found : scala.collection.immutable.Vector[Array[Double]]
答案 0 :(得分:3)
问题是由于更高版本的更改。曾经在v0.91中工作的代码现在需要调整v1.0。您可以找到latest docs here解决方案是添加向量而非向量,尽管错误告诉您。尝试:
import org.apache.spark.mllib.regression.LinearRegressionWithSGD
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.mllib.linalg.Vectors
// Load and parse the data
val data = sc.textFile("mllib/data/ridge-data/lpsa.data")
val parsedData = data.map { line =>
val parts = line.split(',')
LabeledPoint(parts(0).toDouble, Vectors.dense(parts(1).split(' ').map(x => x.toDouble)))
}