约束H2O GLM中的拦截项

时间:2019-01-16 19:03:16

标签: r regression h2o

我熟悉如何在h2o.glm()中使用constrain the Betas(回归参数),但是努力理解如何扩展它以限制拦截。

(我知道intercept=FALSE会将其约束为零,但我对非零约束很感兴趣。)

名义示例数据集:

n <- 100

set.seed(1)

getPoints <- function(n){
    rbind(
        data.frame(col= factor('red', levels=c('red','blue')), 
                   x1 = rnorm(n=n,mean=11,sd = 2), 
                   x2 = rnorm(n=n,mean=5,sd=1)),
        data.frame(col='blue', 
                   x1 = rnorm(n=n,mean=13,sd = 2), 
                   x2 = rnorm(n=n,mean=7,sd=1))
        )
}

df1     <- getPoints(n)

约束示例:

param_names <- c('Intercept', 'x1', 'x2')
param_vals  <- c(       27.5, -1.1, -2.7)

beta_const_df <- data.frame(names = c('Intercept','x1','x2'),
                            lower_bounds = param_vals-0.1,
                            upper_bounds = param_vals+0.1,
                            beta_start   = param_vals)

如果我省略“拦截”约束,约束将起作用:

glm1 <- h2o.glm(x=c('x1','x2'),
                y='col',
                family='binomial',
                lambda=0,
                alpha=0,
                training_frame = 'df1',
                beta_constraints=beta_const_df[-1,] 
                )
glm1@model$coefficients
# Intercept        x1        x2 
#  27.68408  -1.00000  -2.60000 

但是,如果我包含“拦截”约束,其他约束也会失败。

glm2 <- h2o.glm(x=c('x1','x2'),
                y='col',
                family='binomial',
                lambda=0,
                alpha=0,
                training_frame = 'df1',
                beta_constraints=beta_const_df)   
glm2@model$coefficients
#  Intercept          x1          x2 
# 0.67783085 -0.01185921 -0.03083395 

约束拦截的正确语法是什么?

2 个答案:

答案 0 :(得分:1)

尝试将standardize参数设置为False(如下面的代码所示),您可以详细了解beta_constraints参数here

glm1 <- h2o.glm(x=c('x1','x2'),
                y='col',
                family='binomial',
                lambda=0,
                alpha=0,
                training_frame = as.h2o(df1),
                beta_constraints=beta_const_df,
                standardize = F
)
glm1@model$coefficients
> glm1@model$coefficients
#Intercept        x1        x2 
#27.6      -1.0      -2.6 

答案 1 :(得分:0)

如果所有约束都是严格相等的解决方法

我可能会因偏离rho而受到严重的L2惩罚beta_given,并且似乎在这里支持Intercept

beta_const_df <- data.frame(names = c('Intercept','x1','x2'),
                            #lower_bounds = param_vals-0.1, #don't bound
                            #upper_bounds = param_vals+0.1,
                            #beta_start   = param_vals, # use beta_given
                            beta_given   = param_vals, # new
                            rho          = 1e9 )       # new

然后这有效:

glm2 <- h2o.glm(x=c('x1','x2'),
                y='col',
                family='binomial',
                lambda=0,
                alpha=0,
                training_frame = 'df1',
                beta_constraints=beta_const_df)

glm2@model$coefficients
# Intercept        x1        x2 
#      27.5      -1.1      -2.7 
all.equal(glm2@model$coefficients, param_vals, check.names=FALSE) # TRUE

仅当您具有所有相等约束(没有明确的上限和下限)时,此方法才有效。

无论哪种方式,仍然想知道是否存在一种更简单的方法。