Question

所以我试图从根本上理解随机森林和部分依赖图。为此，我尝试使用一个简单的系统，在该系统中，我知道方程式已放入随机森林中。

我对部分依赖图的理解是，感兴趣的变量保持不变，所有其余变量均取其平均值。然后遍历所构造的随机森林，并将新数据的预测响应（即具有所有平均值）绘制为偏相关性。

但是，与randomForest包中的partialPlot函数生成的部分依赖线相比，我的结果总是有一点偏移。

关于什么可能导致抵消的任何想法？

我已经在下面包含了我使用的代码。

代码：

library(randomForest)

set.seed(123) # Set the seed so results can be reproduced

range = seq(0,1,0.01) # Range of numbers for equations
constant1 = 2 # First constant
constant2 = 3 # Second constant
x1 = range/(constant1+range) # First equation
x2 = range/(constant2+range) # Second equation
random = runif(101,0,0.05) # Add some noise
y = x1 - x2 + random # Equation for the response variable
pred_data = cbind(x1,x2) # Combine the first and second predictor variables into one matrix

Mdl = randomForest(pred_data,y,importance=TRUE,ntree=500,keep.inbag=TRUE,keep.forest=TRUE) # Run the random forest model

partialPlot(Mdl,pred_data,x1) # Create a partial dependence plot for the first predictor variable

pred_data[,2] = mean(x2) # Make all the values in the second variable the average from the second variable
predictions = predict(Mdl,pred_data) # Run the new predictions through the the model
points(x1,predictions) # Plot the new predictions as points

为什么我制作的图与randomForest包中的部分依存图不匹配？

0 个答案: