他们用任何方式将随机森林显示为非线性使用假设的100个属性

时间:2015-06-22 04:13:35

标签: random-forest

他们以任何方式将随机森林显示为非线性使用假设100个属性。 实际上,我将J48的准确性与随机森林进行了比较。随机森林效果更好,因为非常适合非线性,因此如何证明或展示它。

1 个答案:

答案 0 :(得分:1)

首先简单证明RF可以捕获一些非线性信号:

obs=2500
vars = 100
X = data.frame(replicate(vars,runif(obs,-2,2)))
y = apply(X,1,function(x) sum(x^2))
rfo = randomForest(X,y)
print(rfo)

由于randomForest可以拟合非线性信号,解释方差为85%(袋外交叉验证),因此在实证中显示。

其次,您可以使用以下示例检查任何随机森林模型的任何曲率:

我使用forestFloor包来显示从隐藏的非线性函数中对数据集样本进行训练的RF模型:

y = f(x)= {x_1} ^ 2 + sin(x_2)+ x_3 * x_4其中X是从正态分布中采样的。随着找到的曲率再现非线性隐藏函数

library(forestFloor)
library(randomForest)
#simulate data
obs=2500
vars = 6 
X = data.frame(replicate(vars,rnorm(obs)))
Y = with(X, X1^2 + sin(X2*pi) + 2 * X3 * X4 + 1 * rnorm(obs))

#grow a forest, remeber to include inbag
rfo=randomForest(X,Y,keep.inbag = TRUE,sampsize=1500,ntree=500)

#compute/extract mapping curvature
ff = forestFloor(rfo,X)

#plot partial functions of most important variables first
plot(ff) 

#Non interacting functions are well displayed, whereas X3 and X4 are not
#by applying different colourgradient, interactions reveal themself 
Col = fcol(ff,3,orderByImportance=FALSE)
plot(ff,col=Col,plot_GOF=TRUE) 

#in 3D the interaction between X3 and X reveals itself completely
show3d(ff,3:4,col=Col,plot.rgl=list(size=5),orderByImportance=FALSE) 

#although no interaction, a joined additive effect of X1 and X2
#colour by FC-component FC1 and FC2 summed
Col = fcol(ff,1:2,orderByImportance=FALSE,X.m=FALSE,RGB=TRUE)
plot(ff,col=Col) 
show3d(ff,1:2,col=Col,plot.rgl=list(size=5),orderByImportance=FALSE) 

#...or two-way gradient is formed from FC-component X1 and X2.
Col = fcol(ff,1:2,orderByImportance=FALSE,X.matrix=TRUE,alpha=0.8) 
plot(ff,col=Col) 
show3d(ff,1:2,col=Col,plot.rgl=list(size=5),orderByImportance=FALSE)