错误:scala决策树实现“带有替代方法的方法值trainClassifier的重载”

时间:2018-12-09 00:08:49

标签: scala apache-spark-mllib decision-tree

我正在尝试使用以下方法来实现决策树:https://spark.apache.org/docs/latest/mllib-decision-tree.html#examples 我的示例代码是:

val splits = predictionsNewdfNew.randomSplit(Array(0.7, 0.3))
val (trainingData, testData) = (splits(0), splits(1))
val numClasses = 3
val categoricalFeaturesInfo = Map[Int, Int]()
val impurity = "gini"
val maxDepth = 5
val maxBins = 32
val model2 = DecisionTree.trainClassifier(trainingData, numClasses, categoricalFeaturesInfo, impurity, maxDepth, maxBins)

'predictionsNewdfNew'是一个带有示例行的数据框,例如:

|Apn5Q_b6Nz61Tq4Xz...|51.0918130155|-114.031674872|  3| 24|4.0|        good|
|AjEbIBw6ZFfln7ePH...|   35.9607337|   -114.939821|  3|  3|4.5|satisfactory|
|bFzdJJ3wp3PZssNEs...|   33.4499993|  -112.0769793|  7|  8|1.5|         bad|

最后一列是标签。 错误是:

overloaded method value trainClassifier with alternatives: 
(input: org.apache.spark.api.java.JavaRDD[org.apache.spark.mllib.regression.LabeledPoint],
numClasses: Int,
categoricalFeaturesInfo: java.util.Map[Integer,Integer],
impurity: String,
maxDepth: Int,
maxBins: Int)org.apache.spark.mllib.tree.model.DecisionTreeModel 
<and> (input: org.apache.spark.rdd.RDD[org.apache.spark.mllib.regression.LabeledPoint],
numClasses: Int,
categoricalFeaturesInfo: Map[Int,Int],
impurity: String,
maxDepth: Int,
maxBins: Int)
org.apache.spark.mllib.tree.model.DecisionTreeModel cannot be applied to (org.apache.spark.sql.Dataset[org.apache.spark.sql.Row], String, Int, Int)

有人可以帮助我了解此处语法的错误之处。

谢谢。

0 个答案:

没有答案