在Scala示例中使用Spark的MLlib线性回归缺少什么导入工作?

时间:2014-05-21 19:11:11

标签: scala apache-spark

使用Spark v1.0-rc3 - 实现MLlib的线性回归时,我收到错误。所以我最终尝试从Spark's MLlib example code复制/粘贴Scala中的线性回归,但仍然收到错误:

scala> val parsedData = data.map { line => val parts = line.split(',') LabeledPoint(parts(0).toDouble, parts(1).split(' ').map(x => x.toDouble).toArray) } <console>:28: error: polymorphic expression cannot be instantiated to expected type; found : [U >: Double]Array[U] required: org.apache.spark.mllib.linalg.Vector LabeledPoint(parts(0).toDouble, parts(1).split(' ').map(x => x.toDouble).toArray)

错误表明需要org.apache.spark.mllib.linalg.Vector,但导入它无济于事。即使尝试多种方法转换为Vector,我也可以

<console>:19: error: type mismatch; found : scala.collection.immutable.Vector[Array[Double]]

1 个答案:

答案 0 :(得分:3)

问题是由于更高版本的更改。曾经在v0.91中工作的代码现在需要调整v1.0。您可以找到latest docs here解决方案是添加向量而非向量,尽管错误告诉您。尝试:

import org.apache.spark.mllib.regression.LinearRegressionWithSGD
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.mllib.linalg.Vectors

// Load and parse the data
val data = sc.textFile("mllib/data/ridge-data/lpsa.data")
val parsedData = data.map { line =>
  val parts = line.split(',')
  LabeledPoint(parts(0).toDouble, Vectors.dense(parts(1).split(' ').map(x => x.toDouble)))
  }