使用R

时间:2018-04-09 21:09:22

标签: r machine-learning random-forest

我对机器学习很陌生,而且我偶然发现了一个问题,无论我多么努力,我都无法找到解决方案。

我使用randomForest算法执行了多类分类程序,并找到了一个能够充分预测我的测试样本的模型。然后我使用varImpPlot()来确定哪些预测变量对确定类分配最重要。

我的问题:我想知道为什么这些预测因素是最重要的。具体来说,我希望能够报告属于X类的案例持有特征A(例如,男性),B(例如,年龄较大)和C(例如,具有高智商),而案例属于Y级保持特征D(女性),E(较年轻)和F(低智商),等等我的其他课程。

我知道标准的二元逻辑回归可以让你说特征A上具有高值的情况更可能属于X类。所以,我希望在概念上类似的东西,但从多个类的随机森林分类模型。

这是否可以使用随机森林模型完成?如果是,randomForestcaret(或甚至其他地方)的功能是否可以帮助我超越varImpPlot()varImp()表?

谢谢!

1 个答案:

答案 0 :(得分:0)

有一个名为ExplainPrediction的包,它承诺对随机森林模型进行解释。这是DESCRIPTION文件的顶部。 URL页面包含指向an extensive citation list的链接:

Package: ExplainPrediction
Title: Explanation of Predictions for Classification and Regression Models
Version: 1.3.0
Date: 2017-12-27
Author: Marko Robnik-Sikonja
Maintainer: Marko Robnik-Sikonja <marko.robnik@fri.uni-lj.si>
Description: Generates explanations for classification and regression models and visualizes them.
 Explanations are generated for individual predictions as well as for models as a whole. Two explanation methods
 are included, EXPLAIN and IME. The EXPLAIN method is fast but might miss explanations expressed redundantly
 in the model. The IME method is slower as it samples from all feature subsets.
 For the EXPLAIN method see Robnik-Sikonja and Kononenko (2008) <doi:10.1109/TKDE.2007.190734>, 
 and the IME method is described in Strumbelj and Kononenko (2010, JMLR, vol. 11:1-18).
 All models in package 'CORElearn' are natively supported, for other prediction models a wrapper function is provided 
 and illustrated for models from packages 'randomForest', 'nnet', and 'e1071'.
License: GPL-3
URL: http://lkm.fri.uni-lj.si/rmarko/software/
Imports: CORElearn (>= 1.52.0),semiArtificial (>= 2.2.5)
Suggests: nnet,e1071,randomForest

此外:

Package: DALEX
Title: Descriptive mAchine Learning EXplanations
Version: 0.1.1
Authors@R: person("Przemyslaw", "Biecek", email = "przemyslaw.biecek@gmail.com", role = c("aut", "cre"))
Description: Machine Learning (ML) models are widely used and have various applications in classification 
  or regression. Models created with boosting, bagging, stacking or similar techniques are often
  used due to their high performance, but such black-box models usually lack of interpretability.
  'DALEX' package contains various explainers that help to understand the link between input variables and model output.
  The single_variable() explainer extracts conditional response of a model as a function of a single selected variable.
  It is a wrapper over packages 'pdp' and 'ALEPlot'.
  The single_prediction() explainer attributes arts of model prediction to articular variables used in the model.
  It is a wrapper over 'breakDown' package.
  The variable_dropout() explainer assess variable importance based on consecutive permutations.
  All these explainers can be plotted with generic plot() function and compared across different models.
Depends: R (>= 3.0)
License: GPL
Encoding: UTF-8
LazyData: true
RoxygenNote: 6.0.1.9000
Imports: pdp, ggplot2, ALEPlot, breakDown
Suggests: gbm, randomForest, xgboost
URL: https://pbiecek.github.io/DALEX/
BugReports: https://github.com/pbiecek/DALEX/issues
NeedsCompilation: no
Packaged: 2018-02-28 01:44:36 UTC; pbiecek
Author: Przemyslaw Biecek [aut, cre]
Maintainer: Przemyslaw Biecek <przemyslaw.biecek@gmail.com>
Repository: CRAN
Date/Publication: 2018-02-28 16:36:14 UTC
Built: R 3.4.3; ; 2018-04-03 03:04:04 UTC; unix