如何为MLlib的随机森林选择组合策略

时间:2016-06-07 16:28:37

标签: scala apache-spark random-forest apache-spark-mllib

是否有可能为MLlib的随机森林选择组合策略?我找不到官方API文档的任何线索。

这是我的代码:

val numClasses = 10
val categoricalFeaturesInfo = Map[Int, Int]()
val numTrees = 10 
val featureSubsetStrategy = "auto" 
val impurity = "entropy"
val maxDepth = 2
val maxBins = 320

val model = RandomForest.trainClassifier(trainData, numClasses, categoricalFeaturesInfo,
  numTrees, featureSubsetStrategy, impurity, maxDepth, maxBins)

val predictionAndLabels = testData.map { case LabeledPoint(label, features) =>
  val prediction = model.predict(features)
  (prediction, label)
}

我知道预测方法(在treeEnsembleModels类中实现)考虑了组合策略(Sum,Average或Vote):

def predict(features: Vector): Double = {
    (algo, combiningStrategy) match {
      case (Regression, Sum) =>
        predictBySumming(features)
      case (Regression, Average) =>
        predictBySumming(features) / sumWeights
      case (Classification, Sum) => // binary classification
        val prediction = predictBySumming(features)
        // TODO: predicted labels are +1 or -1 for GBT. Need a better way to store this info.
        if (prediction > 0.0) 1.0 else 0.0
      case (Classification, Vote) =>
        predictByVoting(features)
      case _ =>
        throw new IllegalArgumentException(
          "TreeEnsembleModel given unsupported (algo, combiningStrategy) combination: " +
        s"($algo, $combiningStrategy).")
    }
}

1 个答案:

答案 0 :(得分:0)

我说可以做的唯一方法就是在建立模型后使用反射。这必须是可能的,因为字段使用是延迟的(我还没有尝试运行这个代码,但是这样可以工作)。

RandomForestModel model = ...;
Class<?> c = model.getClass();
Field strategy = c.getDeclaredField("combiningStrategy");
strategy.set(model, whatever);