Spark Mllib随机森林模型FeatureImportance将为NULL

时间:2016-01-13 23:33:41

标签: apache-spark apache-spark-mllib

我有一个随机森林模型,我试图获取featureImportance向量。

 
Map<Object, Object> categoricalFeaturesParam = new HashMap<>();
scala.collection.immutable.Map<Object, Object> categoricalFeatures = (scala.collection.immutable.Map<Object, Object>)
        scala.collection.immutable.Map$.MODULE$.apply(JavaConversions.mapAsScalaMap(categoricalFeaturesParam).toSeq());
int numberOfClasses = 2;
RandomForestClassifier rfc = new RandomForestClassifier();
RandomForestClassificationModel rfm = RandomForestClassificationModel.fromOld(model, rfc, categoricalFeatures, numberOfClasses);
System.out.println(rfm.featureImportances());

当我运行上面的代码时,我发现featureImportance为null。我是否需要设置具体的内容以获得随机森林模型的特征重要性。

尝试使用1.6版本的Spark,它在API中使用了numberOfFeatures第五个参数,但仍然将featureImportance设为null。

RandomForestClassifier rfc = getRandomForestClassifier(numTrees,maxBinSize,maxTreeDepth,seed,impurity); RandomForestClassificationModel rfm = RandomForestClassificationModel.fromOld(model,rfc,categoricalFeatures,numberOfClasses,numberOfFeatures); 的System.out.println(rfm.featureImportances());

堆栈跟踪: 线程&#34; main&#34;中的例外情况显示java.lang.NullPointerException                 在org.apache.spark.ml.tree.impl.RandomForest $ .computeFeatureImportance(RandomForest.scala:1152)                 在org.apache.spark.ml.tree.impl.RandomForest $$ anonfun $ featureImportances $ 1.apply(RandomForest.scala:1111)                 在org.apache.spark.ml.tree.impl.RandomForest $$ anonfun $ featureImportances $ 1.apply(RandomForest.scala:1108)                 在scala.collection.IndexedSeqOptimized $ class.foreach(IndexedSeqOptimized.scala:33)                 at scala.collection.mutable.ArrayOps $ ofRef.foreach(ArrayOps.scala:186)                 在org.apache.spark.ml.tree.impl.RandomForest $ .featureImportances(RandomForest.scala:1108)                 at org.apache.spark.ml.classification.RandomForestClassificationModel.featureImportances $ lzycompute(RandomForestClassifier.scala:237)                 在org.apache.spark.ml.classification.RandomForestClassificationModel.featureImportances(RandomForestClassifier.scala:237)                 在com.markmonitor.antifraud.ce.ml.CheckFeatureImportance.main(CheckFeatureImportance.java:49)

0 个答案:

没有答案