随机森林中的R ^ 2,使用cforest()

时间:2017-04-24 12:15:49

标签: r random-forest variation

亲爱的程序员和统计员,

我想知道当使用包cforest()中的函数party时,我们如何检索(或计算)R ^ 2。来自同名包的函数randomForest()返回确定系数,而cforest()则不然。我在这里阅读https://stats.stackexchange.com/questions/7357/manually-calculated-r2-doesnt-match-up-with-randomforest-r2-for-testing,使用包randomForest()中的以下公式计算R ^ 2:

R2<-1 - sum((y-predicted)^2)/sum((y-mean(y))^2) # y is the actual value

然而,当我比较来自randomForest()cforest()的R ^ 2时,我发现了一个巨大的差异:

#### Minimal reproducible example ####

### Vectors ###

ARTICLE<-c("Yes", "Yes", "No", "Yes", "No", "No", 
"Yes", "No", "No", "No", "No", "No", "Yes", "No", "Yes", "No", "No", "No", "No", "No", "No", "No", 
"No", "No", "No", "No", "No", "No", "No", "No", "Yes", "No", "No", "No", "No", "No", "No", "No", 
"No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", 
"No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", 
"Yes", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", 
"No", "No", "No", "No", "No", "No", "Yes", "No", "No")

COMPSYNT<-c("NP", "NP", "DetPoss", "NP", "NP", "NP", "NP", "NP", "NP", 
    "NP", "NP", "NP", "PronPers", "NP", "PronForm", "NP", "NP", "NP", "PronForm", "NP", "PronForm", "NP", "PronForm", "PronForm", 
    "NP", "PronPers", "PronForm", "NP", "DetPoss", "NP", "PronForm", "NFClau", "PronForm", "NP", "NP", "NP", "PronForm", "PronForm", "PronForm", 
    "NP", "NP", "NP", "PronForm", "PronForm", "NP", "NP", "PronForm", "PronForm", "NP", "PronForm", "PronForm", "NP", "NFClau", "NP", 
    "PronForm", "NP", "NP", "NP", "NP", "NP", "NP", "PronForm", "PronForm", "NP", "NP", "NP", "PronForm", "NP", "PronForm", 
    "NP", "PronForm", "NFClau", "PronForm", "NP", "NP", "NFClau", "PronForm", "NP", "NP", "NP", "PronForm", "PronForm", "PronForm", "NP", 
    "PronForm", "NP", "NP", "PronForm", "PronForm", "PronForm", "NP", "PronForm", "PronPers", "NP", "NP")

POSITION<-c("Fin", "Fin", "Med", "Med", "Fin", "Fin", "Fin", "Fin", "Fin", 
    "Fin", "Med", "Fin", "Init", "Fin", "Med", "Fin", "Fin", "Fin", "Init", "Fin", "Init", "Init", "Init", "Init", 
    "Fin", "Fin", "Fin", "Fin", "Init", "Init", "Init", "Fin", "Init", "Init", "Fin", "Fin", "Init", "Init", "Init", 
    "Fin", "Fin", "Med", "Med", "Init", "Init", "Fin", "Fin", "Init", "Fin", "Fin", "Fin", "Fin", "Med", "Init", 
    "Init", "Med", "Fin", "Fin", "Init", "Init", "Med", "Init", "Init", "Fin", "Fin", "Init", "Init", "Init", "Init", 
    "Fin", "Fin", "Med", "Init", "Fin", "Fin", "Med", "Init", "Fin", "Fin", "Fin", "Init", "Init", "Fin", "Init", 
    "Init", "Fin", "Fin", "Init", "Init", "Init", "Fin", "Init", "Fin", "Fin", "Init")

COMPTYPE<-c("Abstr_1", 
    "Conc", "Hum", "Abstr_2", "Hum", "Hum", "Conc", "Hum", "Hum", "Hum", "Hum", "Hum", "Hum", "Hum", "Conc", "Hum", 
    "Hum", "Hum", "Hum", "Abstr_2", "Hum", "Hum", "Hum", "Hum", "Abstr_1", "Conc", "Abstr_1", "Conc", "Conc", "Abstr_1", "Conc", 
    "Abstr_2", "Hum", "Abstr_1", "Abstr_1", "Conc", "Conc", "Plant", "Hum", "Conc", "Abstr_2", "Conc", "Abstr_1", "Abstr_1", "Abstr_1", "Hum", 
    "Abstr_1", "Conc", "Hum", "Abstr_1", "Abstr_2", "Abstr_1", "Abstr_2", "Conc", "Hum", "Abstr_1", "Conc", "Abstr_1", "Hum", "Abstr_1", "Abstr_1", 
    "Hum", "Abstr_2", "Conc", "Abstr_1", "Conc", "Hum", "Conc", "Abstr_1", "Conc", "Abstr_1", "Abstr_2", "Conc", "Conc", "Hum", "Abstr_2", 
    "Conc", "Abstr_2", "Abstr_2", "Conc", "Abstr_2", "Conc", "Abstr_1", "Abstr_1", "Abstr_1", "Abstr_2", "Hum", "Hum", "Conc", "Abstr_2", "Abstr_1", 
    "Hum", "Abstr_2", "Conc", "Hum")

SUBSTYPE<-c("Repl", 
    "Repl", "Repl", "Contr", "Repl", "Repl", "Repl", "Repl", "Repl", "Repl", "Repl", "Repl", "Comp", "Repl", "Repl", "Repl", 
    "Comp", "Repl", "Repl", "Repl", "Repl", "Repl", "Repl", "Repl", "Repl", "Repl", "Repl", "Repl", "Contr", "Contr", "Contr", 
    "Contr", "Repl", "Repl", "Repl", "Repl", "Repl", "Repl", "Repl", "Repl", "Repl", "Repl", "Repl", "Repl", "Contr", "Contr", 
    "Contr", "Repl", "Contr", "Repl", "Repl", "Repl", "Contr", "Contr", "Repl", "Contr", "Repl", "Repl", "Repl", "Contr", "Contr", 
    "Repl", "Contr", "Repl", "Contr", "Repl", "Repl", "Contr", "Contr", "Contr", "Contr", "Contr", "Contr", "Contr", "Contr", "Contr", 
    "Contr", "Repl", "Repl", "Comp", "Repl", "Repl", "Repl", "Contr", "Repl", "Contr", "Contr", "Repl", "Repl", "Contr", "Contr", 
    "Repl", "Contr", "Repl", "Repl")

VARIANT<-c("1", "1", "1", "1", "1", "1", "1", "1", "1", 
    "1", "1", "1", "1", "1", "1", "1", "1", "1", "2", "2", "2", "2", "2", "2", 
    "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", 
    "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", 
    "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", 
    "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", 
    "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2")

PERIOD<-c("1", "1", "1", 
    "1", "1", "1", "1", "1", "1", "1", "3", "3", "3", "3", "3", "4", "4", "4", 
    "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", 
    "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", 
    "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", 
    "1", "1", "1", "1", "1", "1", "2", "2", "2", "2", "2", "2", "3", "3", "3", 
    "3", "3", "3", "3", "3", "3", "3", "3", "4", "4", "4", "4", "4", "4", "4", 
    "4", "4")

PRED<-c(0.9479936898, 0.919449515, 0.9419154421, 0.5983387557, 
    0.6095731951, 0.6095731951, 0.919449515, 0.6095731951, 0.6095731951, 
    0.6095731951, 0.7030330529, 0.7525290886, 0.5973901173, 0.7525290886, 
    0.8111631081, 0.7758242732, 0.655754515, 0.7758242732, 0.3617200806, 
    0.204189421, 0.3617200806, 0.4091156245, 0.3617200806, 0.3617200806, 
    0.1909593012, 0.111197398, 0.1317200524, 0.1401576975, 0.3357625661, 
    0.0354262613, 0.3251898421, 0.0026529555, 0.3617200806, 0.1277255725, 
    0.1909593012, 0.1401576975, 0.0920054464, 0.0826276571, 0.3617200806, 
    0.1401576975, 0.204189421, 0.1362205175, 0.1076221699, 0.1021952872, 
    0.0354262613, 0.2225893662, 0.013977198, 0.0920054464, 0.2225893662, 
    0.1317200524, 0.1170159378, 0.1909593012, 0.0025081381, 0.0223554982, 
    0.3617200806, 0.0538830716, 0.1401576975, 0.1909593012, 0.4091156245, 
    0.0354262613, 0.0538830716, 0.3617200806, 0.0054761797, 0.1401576975, 
    0.051214367, 0.1171345461, 0.3617200806, 0.0223554982, 0.0141969626, 
    0.0331976869, 0.4525577246, 0.0023048079, 0.0103973282, 0.0331976869, 
    0.2786798396, 0.0025693648, 0.0119143655, 0.3508813284, 0.3508813284, 
    0.1910649906, 0.1038908339, 0.1222175396, 0.260972475, 0.0380847154, 
    0.1368486957, 0.0294733117, 0.3138516914, 0.4183846938, 0.1219226877, 
    0.0062738871, 0.0939148073, 0.4183846938, 0.3356194269, 0.3046349387, 
    0.4823614353)

DEV<-c(0.4479936898, 
    0.419449515, 0.4419154421, 0.0983387557, 0.1095731951, 0.1095731951, 
    0.419449515, 0.1095731951, 0.1095731951, 0.1095731951, 0.2030330529, 
    0.2525290886, 0.0973901173, 0.2525290886, 0.3111631081, 0.2758242732, 
    0.155754515, 0.2758242732, -0.1382799194, -0.295810579, -0.1382799194, 
    -0.0908843755, -0.1382799194, -0.1382799194, -0.3090406988, 
    -0.388802602, -0.3682799476, -0.3598423025, -0.1642374339, 
    -0.4645737387, -0.1748101579, -0.4973470445, -0.1382799194, 
    -0.3722744275, -0.3090406988, -0.3598423025, -0.4079945536, 
    -0.4173723429, -0.1382799194, -0.3598423025, -0.295810579, 
    -0.3637794825, -0.3923778301, -0.3978047128, -0.4645737387, 
    -0.2774106338, -0.486022802, -0.4079945536, -0.2774106338, 
    -0.3682799476, -0.3829840622, -0.3090406988, -0.4974918619, 
    -0.4776445018, -0.1382799194, -0.4461169284, -0.3598423025, 
    -0.3090406988, -0.0908843755, -0.4645737387, -0.4461169284, 
    -0.1382799194, -0.4945238203, -0.3598423025, -0.448785633, 
    -0.3828654539, -0.1382799194, -0.4776445018, -0.4858030374, 
    -0.4668023131, -0.0474422754, -0.4976951921, -0.4896026718, 
    -0.4668023131, -0.2213201604, -0.4974306352, -0.4880856345, 
    -0.1491186716, -0.1491186716, -0.3089350094, -0.3961091661, 
    -0.3777824604, -0.239027525, -0.4619152846, -0.3631513043, 
    -0.4705266883, -0.1861483086, -0.0816153062, -0.3780773123, 
    -0.4937261129, -0.4060851927, -0.0816153062, -0.1643805731, 
    -0.1953650613, -0.0176385647)

### Combining the vectors into a data frame ###

mydata<-as.data.frame(cbind(ARTICLE, COMPSYNT, COMPTYPE, DEV, PERIOD, POSITION, PRED, SUBSTYPE, VARIANT))
mydata$DEV<-as.numeric(as.character(mydata$DEV))
mydata$PRED<-as.numeric(as.character(mydata$PRED))

### First random forest on my data: 'randomForest' (package: 'randomForest') ###
set.seed(123)
mydata.rf1<-randomForest(DEV ~ ARTICLE + COMPSYNT + POSITION + COMPTYPE + SUBSTYPE + PERIOD, data=mydata, ntree=2000, mtry=2, importance=TRUE)
print(mydata.rf1)

Call:
 randomForest(formula = DEV ~ ARTICLE + COMPSYNT + POSITION +      COMPTYPE + SUBSTYPE + PERIOD, data = mydata, ntree = 2000,      mtry = 2, importance = TRUE) 
               Type of random forest: regression
                     Number of trees: 2000
No. of variables tried at each split: 2

          Mean of squared residuals: 0.0110483
                    % Var explained: 83.33
## MSE = 0.0110483
## pseudo-R^2 = 0.8333

### Second random forest on my data: 'cforest' (package: 'party') ###

set.seed(123)
mydata.rf2<-cforest(DEV ~ ARTICLE + COMPSYNT + POSITION + COMPTYPE + SUBSTYPE + PERIOD, data=mydata, controls=cforest_unbiased(ntree=2000, mtry=2))
oob.pred<-predict(mydata.rf2, type="response", OOB=TRUE)
residual<-DEV-oob.pred
mse<-sum(residual^2)/length(DEV)
pseudo.R2<-1-mse/var(DEV)

## MSE = 0.0380004
## pseudo-R^2 = 0.4327

我似乎无法弄清楚为什么我的两个R ^ 2值之间存在如此大的差异。我的问题如下:

1)当我们使用cforest()时,可以使用上面R ^ 2的公式吗?如果是的话,为什么我会得到这样不同的价值? 2)当我们使用cforest()

时,是否有更简单的方法来检索R ^ 2

我事先感谢你们的解释和建议。

0 个答案:

没有答案