通过glm()以编程方式将family =传递给step()

时间:2016-04-20 16:47:32

标签: r glm

我试图通过模拟演示不同模型和特征选择技术的性能,所以我希望以编程方式将各种参数传递给glm()

?glm下,我们读到了(italics mine):

  

family :描述模型中使用的错误分布和链接函数。对于glm,这可以是命名a的字符串   家庭功能,家庭功能或家庭功能呼叫的结果。对于glm.fit,仅支持第三个选项。 (有关家庭功能的详细信息,请参见家庭。)

问题在于,当我在生成的模型上调用step()时,似乎存在范围问题并且family=参数不再被识别。

这是一个最小的例子:

getCoef <- function(formula, 
                family = c("gaussian", "binomial"),
                data){

  model_fam <- match.arg(family, c("gaussian", "binomial"))

  fit_null <- glm(update(formula,".~1"), 
                   family = model_fam, 
                   data = data)

  message("So far so good")

  fit_stepBIC <- step(fit_null, 
                      formula, 
                      direction="forward",
                      k = log(nrow(data)),
                      trace=0)

  message("Doesn't make it this far")

  fit_stepBIC$coefficients
}

# returns error 'model_fam' not found 
getCoef(Petal.Length ~ Petal.Width + Species, family = "gaussian", data = iris)

带回溯的错误消息:

> getCoef(Petal.Length ~ Petal.Width + Species, family = "gaussian", data = iris)
So far so good

 Error in stats::glm(formula = Petal.Length ~ Petal.Width + Species, family = model_fam,  : 
  object 'model_fam' not found 
9 stats::glm(formula = Petal.Length ~ Petal.Width + Species, family = model_fam, 
    data = data, method = "model.frame") 
8 eval(expr, envir, enclos) 
7 eval(fcall, env) 
6 model.frame.glm(fob, xlev = object$xlevels) 
5 model.frame(fob, xlev = object$xlevels) 
4 add1.glm(fit, scope$add, scale = scale, trace = trace, k = k, 
    ...) 
3 add1(fit, scope$add, scale = scale, trace = trace, k = k, ...) 
2 step(fit_null, formula, direction = "forward", k = log(nrow(data)), 
    trace = 0) 
1 getCoef(Petal.Length ~ Petal.Width + Species, family = "gaussian", 
    data = iris) 

> sessionInfo()
R version 3.2.4 Revised (2016-03-16 r70336)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] rsconnect_0.4.1.11 tools_3.2.4       

传递此参数的最自然方式是什么,以便通过步骤识别?我知道的一种可能的解决方法是通过glm()上的if-then-else条件,使用显式系列名称调用model_fam

2 个答案:

答案 0 :(得分:2)

我认为基于bquote.()getCoef <- function(formula, family = c("gaussian", "binomial"), data){ model_fam <- match.arg(family, c("gaussian", "binomial")) fit_null <- eval(bquote( glm(update(.(formula),".~1"), family = .(model_fam), data = .(data)))) message("So far so good") fit_stepBIC <- step(fit_null, formula, direction="forward", k = log(nrow(data)), trace=0) message("Doesn't make it this far") fit_stepBIC$coefficients } # returns error 'model_fam' not found getCoef(formula = Petal.Length ~ Petal.Width + Species, family = "gaussian", data = iris) So far so good Doesn't make it this far (Intercept) Speciesversicolor Speciesvirginica Petal.Width 1.211397 1.697791 2.276693 1.018712 的以下解决方案可能会解决您的问题。

我也安装了R-version 3.2.4,我从你的代码中得到了完全相同的错误。下面的解决方案使它在我的电脑上运行。

{{1}}

答案 1 :(得分:1)

问题在于step最终调用model.framemodel.frame在特殊环境中评估术语对象,即定义公式的环境。这通常是调用getCoef的环境。但是在这种环境中model_fam不存在,因为它是在getCoef内定义的。解决这个问题的一种方法是添加

environment(formula) <- environment()

之后

model_fam <- match.arg(family, c("gaussian", "binomial"))

或者那种效果。