LinearRegressionWithSGD.train“错误:类型不匹配”

时间:2018-09-07 06:36:28

标签: scala apache-spark

我在Scala中使用Spark:

import org.apache.spark.mllib.feature.StandardScaler
val scaler = new StandardScaler(withMean = true, withStd = true).fit(
  labeledPoints.rdd.map(x => x.features)
)

val scaledLabledPoints = labeledPoints.map{ x =>
  LabeledPoint(x.label, scaler.transform(x.features))
} 

import org.apache.spark.mllib.regression.LinearRegressionWithSGD
val numIter = 20
scaledLabledPoints.cache

val linearRegressionModel = LinearRegressionWithSGD.train(scaledLabledPoints, numIter)

此错误发生在最后一行:

<console>:64: error: type mismatch;
 found   :  org.apache.spark.sql.Dataset[org.apache.spark.mllib.regression.LabeledPoint]
 required: org.apache.spark.rdd.RDD[org.apache.spark.mllib.regression.LabeledPoint]
   val linearRegressionModel = LinearRegressionWithSGD.train(scaledLabledPoints, numIter)
                                                             ^

该错误如何解决?为什么会发生?

1 个答案:

答案 0 :(得分:1)

嘿,您正在使用DataFrames和Datasets,但也使用Spark MLlib的旧RDD API。您应该使用ML API:org.apache.spark.ml库(而不是mllib)

如果您仍然想使用MLlib API,则可以尝试以下方法:

val linearRegressionModel = LinearRegressionWithSGD.train(scaledLabledPoints.rdd, numIter)