R中三个协变量和两个断点的分段回归出错

时间:2014-09-17 15:44:12

标签: r linear-regression piecewise

我试图用三个协变量(X,Y,Z)和两个断点来估计变量V的断点。

响应变量V = aX + bY + cZ + d

我模拟数据,其中(a,b,c,d)有3组值为(0.6,0.2,0.8,0.15),(1.6,1.2,1.8,1.15)和(3,5,4, 2.5)

我使用分段包估计系数但得到以下错误:

Error in segmented.lm(linearFit, seg.Z = ~X + Y + Z, psi = list(X = c(NA),  :   

Bootstrap restart only with a fixed number of breakpoints

这是我的代码,包含数据

    #trapezoidal data    
    ref=c(rep(1,100),seq(1,10,0.05),rep(10,150),seq(10,0,-0.05),rep(0,200))

    #covariates
    xx=cumsum(ref) 
    yy=diff(xx)
    zz=diff(yy)

    #equalizing lengths of above vectors
    vecL=length(zz)
    xx=xx[1:vecL]
    yy=yy[1:vecL]
    zz=zz[1:vecL]

    #adding noise to covariates
    set.seed(10)
    X=xx + max(xx)/100*rnorm(vecL)
    Y=yy + max(yy)/100*rnorm(vecL)
    Z=zz + max(zz)/100*rnorm(vecL)

    #three segment response variable, total 830 points
    V[1:200]   = 0.6 *X[1:200]+   0.2 *Y[1:200]+   0.8 *Z[1:200]+   0.15 + 0.01*rnorm(200)
    V[201:400] = 1.6 *X[201:400]+ 1.2 *Y[201:400]+ 1.8 *Z[201:400]+ 1.15 + 0.01*rnorm(200)
    V[401:830] = 3.0 *X[401:830]+ 5.0 *Y[401:830]+ 4.0 *Z[401:830]+ 2.50 + 0.01*rnorm(430)

    ##linear model

    linearFit=lm(formula=V~X+Y+Z)
    summary(linearFit)


    ##segmented 

    segFit=segmented(linearFit,seg.Z=~X+Y+Z,psi=list(X=c(NA),Y=c(NA),Z=c(NA)),control=seg.control(display=TRUE, K=4, stop.if.error=FALSE))

这是输出:

segFit=segmented(linearFit,seg.Z=~X+Y+Z,psi=list(X=c(NA),Y=c(NA),Z=c(NA)),control=seg.control(display=TRUE, K=4, stop.if.error=FALSE))
Error in segmented.lm(linearFit, seg.Z = ~X + Y + Z, psi = list(X = c(NA),  : 
  Bootstrap restart only with a fixed number of breakpoints

我是否正确设置psi和控制?任何帮助表示赞赏。

1 个答案:

答案 0 :(得分:0)

自动断点检测似乎非常具有实验性,文档指出了这一点。提供有限数量的起始值会更好。但无论如何,我可以让拟合函数开始像这样运行:

segFit=segmented(linearFit,seg.Z=~X+Y+Z,psi=list(X=c(NA),Y=c(NA),Z=c(NA)),
                 control=seg.control(display=TRUE, K=4, stop.if.error=FALSE, n.boot=0, it.max=50))
#0   287035116.259  (No breakpoint(s)) 
#1   52847700.113  12 
#2   66421579.610  7 
#3   60143023.830  7 
#4   55936266.042  7 
#5   45478319.984  5 
#6   37237514.620  5 
#7   34058342.767  5 
#8   33889551.970  3 
#9   33679837.419  3 
#10  33680392.183  3 
#Error in eval(expr, envir, enclos) : object 'U1.Y' not found

它对我们造成了错误。我的解释是Y没有找到断点。因此,我从断点公式中删除了它:

segFit=segmented(linearFit,seg.Z=~X+Z,psi=list(X=c(NA),Z=c(NA)),
                 control=seg.control(display=TRUE, K=4, stop.if.error=FALSE, n.boot=0, it.max=50))
#0   287035116.259  (No breakpoint(s)) 
#1   57518175.693  8 
#2   75024714.551  4 
#3   53678468.904  4 
#4   42978477.989  4 
#5   36762393.424  4 
#6   34564133.079  4 
#7   33672729.061  4 
#8   33672705.918  4 
#Error in eval(expr, envir, enclos) : object 'U1.Z' not found

它仍然不喜欢它。我们删除Z

segFit=segmented(linearFit,seg.Z=~X,psi=list(X=c(NA)),
                 control=seg.control(display=TRUE, K=4, stop.if.error=FALSE, n.boot=0, it.max=50))
#0   287035116.259  (No breakpoint(s)) 
#1   59188023.560  4 
#2   84927431.755  3 
#3   58905175.574  3 
#4   46487759.098  3 
#5   39114874.784  3 
#6   34916433.946  3 
#7   33986478.337  3 
#8   33680464.097  3 
#9   33680464.097  3 

成功! (我不确定segmented可以很好地处理一个点上几个变量的中断。)