使用名称从lm对象中提取因子的系数

时间:2014-04-04 12:16:53

标签: r

我按照以下方式安装了一个lm:

data <- data.frame(x=rnorm(50), x2=runif(50), y=rnorm(50), g=rep(1:3,length.out=50))
model <- lm(y ~ x + x2 + factor(g), data=data)

我想通过使用名称来引用因子变量的每个级别的系数,例如我对连续变量的处理方式,例如&#39; x&#39;:

model$coefficients["x"]

我尝试过使用:

> model$coefficients["g"]
<NA> 
  NA 

但它失败了,因为水平被重命名,可以在下面看到:

> model$coefficients
(Intercept)           x          x2  factor(g)2  factor(g)3 
 0.60058881  0.01232678 -0.65508242 -0.25919674 -0.04841089

我还尝试使用显示的名称:

model$coefficients["factor(g)2"]

但它不起作用。我怎么能做到这一点?

非常感谢。

1 个答案:

答案 0 :(得分:1)

我总是尝试在这些情况下使用coef()函数和grep(),我会这样做:

data <- data.frame(x=rnorm(50), x2=runif(50), y=rnorm(50), g=rep(1:3,length.out=50))
model <- lm(y ~ x + x2 + factor(g), data=data)
estimates <- coef(model)
# Just get the g:2
estimates[grep("^factor\\(g\\)2", names(estimates))]

# If you want to get both factors you just skip the 2
estimates[grep("^factor\\(g\\)", names(estimates))]

# This case does not really require fancy 
# regular expressions so you could write
estimates[grep("factor(g)", names(estimates), fixed=TRUE)]

# This comes much more in handy when you have a more complex situtation where
# coefficients have similar names
data <- data.frame(x=rnorm(50), great_g_var=runif(50), y=rnorm(50),
                   g_var=factor(rep(1:3,length.out=50)),
                   g_var2=factor(sample(1:3,size=50, replace=TRUE)))

model <- lm(y ~ x + great_g_var + g_var + g_var2, data=data)
estimates <- coef(model)

# Now if you want to do a simple fixed grep you could end up
# with unexpected estimates
estimates[grep("g_var", names(estimates), fixed=TRUE)]

# Returns:
# great_g_var       g_var2       g_var3      g_var22      g_var23 
# -0.361707955 -0.058988495  0.010967326 -0.008952616 -0.297461520 

# Therefore you may want to use regular expressions, here's how you select g_var
estimates[grep("^g_var[0-9]$", names(estimates))]

# Returns:
# g_var2      g_var3 
# -0.05898849  0.01096733 

# And if you want to have g_var2 you write:
estimates[grep("^g_var2[0-9]$", names(estimates))]

# Returns:
# g_var22      g_var23 
# -0.008952616 -0.297461520