斯卡拉列车分类中的决策树误差

时间:2016-10-27 05:30:30

标签: scala apache-spark decision-tree

val pdata = sc.parallelize(Seq(data))
val parsedData = data.map { line => val parts = line.split(',')          LabeledPoint(parts(0).toDouble, Vectors.dense(parts(1).split('').map(_.toDouble)))}.cache()

// Split the data into training and test sets (30% held out for testing)
val splits = parsedData.randomSplit(Array(0.7, 0.3))
val (trainingData, testData) = (splits(0), splits(1))

// Train a DecisionTree model.
val numClasses = 2
val categoricalFeaturesInfo = {}
val impurity = "gini"
val maxDepth = 5
val maxBins = 32

val model = DecisionTree.trainClassifier(trainingData, numClasses, categoricalFeaturesInfo, impurity, maxDepth, maxBins)

我编写了这段代码,用于在给定数据上构建决策树分类模型。第一列是预测列。 它抛出一个错误,指出“重载的方法值trainClassifier与替代品:”

这是我的示例输入数据:
1 2 50 12500 98
1 0 13 3250 28
1 1 16 4000 35
1 2 20 5000 45
0 1 24 6000 77
0 4 4 1000 4
1 2 7 1750 14
0 1 12 3000 35
1 2 9 2250 22
1 5 46 11500 98
0 4 23 5750 58

0 个答案:

没有答案