使用R中分段的包进行无意义预测

时间:2014-04-24 02:48:41

标签: r package prediction glm

我首先在R中安装了Poisson glm,如下所示:

> Y<-c(13,21,12,11,16,9,7,5,8,8)
> X<-c(74,81,80,79,89,96,69,88,53,72)
> age<-c(50.45194,54.89382,46.52569,44.84934,53.25541,60.16029,50.33870,
+ 51.44643,38.20279,59.76469)
> dat=data.frame(Y=Y,off.set.term=log(X),age=age)
> fit.1=glm(Y~age+offset(off.set.term),data=dat,family=poisson)

接下来,我尝试使用predict函数预测新数据集的响应(在对数比例上)。请注意,我将偏移项设置为零。

> newdat=data.frame(age=c(52.09374,50.89329,50.61472,39.13358,44.79453),off.set.term=rep(0,5))
> predict(fit.1,newdata =newdat,type="link")
        1         2         3         4         5 
-1.964381 -1.956234 -1.954343 -1.876416 -1.914839 

接下来,我在R中尝试了包segmented(版本0.3-0.0)并按如下方式安装了分段glm。 (使用预测函数时,最新版本的分段软件包(即0.3-1.0)似乎不支持偏移项。)

> library(segmented)
> fit.2=segmented(fit.1,seg.Z=~age,psi=list(age=mean(age)),
+ offs=off.set.term,data=newdat)

然后我使用fit.2的预测函数来获得预测值:

> library(segmented)
> fit.2=segmented(fit.1,seg.Z=~age,psi=list(age=mean(age)),offs=off.set.term,data=newdat)
> 
> predict(fit.2,newdata =newdat,type="link")
        1         2         3         4         5 
-26.62968 -26.08611 -25.95997 -20.76125 -23.32456 

这些预测值与我使用fit.1获得的值明显不同。

问题似乎是在偏移项中,因为当我们拟合没有偏移项的模型时,结果是合理的并且彼此接近如下:

> fit.3=glm(Y~age,data=dat,family=poisson)
> newdat.2=data.frame(age=c(52.09374,50.89329,50.61472,39.13358,44.79453))
> predict(fit.3,newdata =newdat.2,type="link")
       1        2        3        4        5 
2.406016 2.395531 2.393098 2.292816 2.342261 
> fit.4=segmented(fit.3,seg.Z=~age,psi=list(age=mean(age)),data=newdat.2)
> predict(fit.4,newdata =newdat.2,type="link")
       1        2        3        4        5 
2.577669 2.524503 2.512165 2.003679 2.254396 

1 个答案:

答案 0 :(得分:1)

由于我从分段软件包维护者那里得到答案,我决定在这里分享它。首先,通过

将软件包更新到0.3-1.0版本
install.packages("segmented",type="source")

更新后,运行相同的命令会导致:

> Y<-c(13,21,12,11,16,9,7,5,8,8)
> X<-c(74,81,80,79,89,96,69,88,53,72)
> age<-c(50.45194,54.89382,46.52569,44.84934,53.25541,60.16029,50.33870,
+ 51.44643,38.20279,59.76469)
> dat=data.frame(Y=Y,off.set.term=log(X),age=age)
> fit.1=glm(Y~age+offset(off.set.term),data=dat,family=poisson)
> 
> newdat=data.frame(age=c(52.09374,50.89329,50.61472,39.13358,44.79453),off.set.term=rep(0,5))
> predict(fit.1,newdata =newdat,type="link")
        1         2         3         4         5 
-1.964381 -1.956234 -1.954343 -1.876416 -1.914839 
> 
> library(segmented)
> fit.2=segmented(fit.1,seg.Z=~age,psi=list(age=mean(age)),offs=off.set.term,data=newdat)
> predict(fit.2,newdata =newdat,type="link")
Error in offset(off.set.term) : object 'off.set.term' not found

因此无法找到偏移项。现在的诀窍(目前)是首先附加newdat,然后预测如下:

> attach(newdat)
The following object is masked _by_ .GlobalEnv:

    age
> predict(fit.2,newdata =newdat,type="link")
        1         2         3         4         5 
-1.825831 -1.853842 -1.860342 -2.128237 -1.996147 

现在结果确实有意义。干杯!