如何在随机森林模型中检测异方差?

时间:2016-04-06 12:53:29

标签: r random-forest

我正在研究随机森林中的回归模型,我想判断模型中是否存在异方差性?

当我开发线性模型时,我可以看到存在异方差性,曲线如下图所示,我想检查随机森林模型的类似残差图。

我在R工作。

It's an Expense Model basis Income,Branch,TotalFamilyMember

enter image description here

1 个答案:

答案 0 :(得分:0)

我们可以使用预测值中的残差重建图:

#Using the regression example from ?randomForest
ozone.rf <- randomForest(Ozone ~ ., data=airq, mtry=3,
                         importance=TRUE)

#Find residuals by subtracting predicted from acutal values
err <- ozone.rf$predicted - airq$Ozone

#Make data frame holding residuals and fitted values
df <- data.frame(Residuals=err, Fitted.Values=ozone.rf$predicted)

#Sort data by fitted values
df2 <- df[order(df$Fitted.Values),]

#Create plot
plot(Residuals~Fitted.Values, data=df2)

#Add origin line at (0,0) with grey color #8
abline(0,0, col=8)

#Add the same smoothing line from lm regression with color red #2
lines(lowess(df2$Fitted.Values, df2$Residuals), col=2)

enter image description here

<强>更新

有一种更简单的方法。我意识到该图只是残差和拟合值的回归,因此这给出了相同的输出:

fitted.values <- ozone.rf$predicted
residuals <- fitted.values - ozone.rf$y
plot(lm(residuals ~ fitted.values), which=1)