Rcaret gbm分类,gbm的caret.gbm(...,type =“ response)和插入符号包的predict(...,type =” prob“)不匹配

时间:2018-07-13 18:03:04

标签: r classification r-caret gbm

亲爱的机器学习和R朋友,

我注意到,在相同的模型调整参数下,直接使用gbm训练gbm模型或插入符号包的train函数训练分类模型会导致不同的结果(请参见下面的代码,使用虹膜数据集进行演示):

1)使用多项式分布对因素进行gbm预测所得出的类别概率与从插入符号包中推导的火车模型所得出的类别概率不同。或充其量,它只是奇怪地紧密缩放到arroung 0.5。 -发生了什么事?

2)gbm预测类的顺序不同于插入号预测(gbm中的类1似乎是插入号中的类2 [在具有2个类的模型中])。为什么会这样?

3)插入符号函数不支持带有数字类和“ distribution =“ bernoulli”的二项式响应变量吗?它发出警告,对于两个类,应该使用因子-可能还会导致不同的预测吗?

当直接并通过插入符号对同一数据集使用randomForest时,似乎不存在这些不匹配的“问题”。

library(caret)
library(gbm)

data(iris)
iris$Species=as.numeric(iris$Species=="virginica")

###caret

trainControl <- trainControl(method="cv", number=3)

set.seed(123)
gbm.c<- train(as.factor(Species) ~ . , data=iris, distribution="multinomial",     

method="gbm", trControl=trainControl(method="none"), verbose=F)            

pr1=predict(gbm.c, newdata=iris, type="prob")
pr1=data.frame(pr1)
max(pr1[,1])
min(pr1[,1])##here the prob range from 0 to 1. perfect.

###GBM
set.seed(123)
gbm.g <- gbm(as.factor(Species) ~ ., data=iris,distribution = "multinomial", verbose=FALSE)

pr2 <- predict(gbm.g, newdata=iris, 100, type="response")
pr2=data.frame(pr2)

max(pr2[,1])
min(pr2[,1])###strange that the whole range for predict.gbm lies between  0.4 to 0.6 prob only, seems some unclear scaling is happening?

cor(pr1[,1], pr2[,1]) # even though the corellation(and r2) look good, the corelation is negative, why is the factor 1 and 2 swapped in one of the two?
plot(pr1[,1], pr2[,1])


class.c=apply(pr1, 1, FUN=function(x) which(x==max(x)))
class.g=apply(pr2, 1, FUN=function(x) which(x==max(x)))


class.c==class.g 
class.g2=rep(1, length(class.g))
class.g2[class.g==1]=2
class.c==class.g2 #class prediction seems to work okay, even though the scaling is puzzeling, and one has to know the wrong order

###random fores
library(randomForest)
set.seed(123)
rf.c<- train(as.factor(Species) ~ . , data=iris, method="rf", trControl=trainControl(method="none"), verbose=F)            

pr.rf1=predict(rf.c, newdata=iris, type="prob")
pr.rf1=data.frame(pr1)
set.seed(123)
rf.r=randomForest(as.factor(Species) ~ . , data=iris)
pr.rf2=predict(rf.r, newdata=iris, type="prob")

cor(pr.rf1[,1], pr.rf2[,1])
plot(pr.rf1[,1], pr.rf2[,1])

0 个答案:

没有答案