我按照以下方式安装了一个lm:
data <- data.frame(x=rnorm(50), x2=runif(50), y=rnorm(50), g=rep(1:3,length.out=50))
model <- lm(y ~ x + x2 + factor(g), data=data)
我想通过使用名称来引用因子变量的每个级别的系数,例如我对连续变量的处理方式,例如&#39; x&#39;:
model$coefficients["x"]
我尝试过使用:
> model$coefficients["g"]
<NA>
NA
但它失败了,因为水平被重命名,可以在下面看到:
> model$coefficients
(Intercept) x x2 factor(g)2 factor(g)3
0.60058881 0.01232678 -0.65508242 -0.25919674 -0.04841089
我还尝试使用显示的名称:
model$coefficients["factor(g)2"]
但它不起作用。我怎么能做到这一点?
非常感谢。
答案 0 :(得分:1)
我总是尝试在这些情况下使用coef()函数和grep(),我会这样做:
data <- data.frame(x=rnorm(50), x2=runif(50), y=rnorm(50), g=rep(1:3,length.out=50))
model <- lm(y ~ x + x2 + factor(g), data=data)
estimates <- coef(model)
# Just get the g:2
estimates[grep("^factor\\(g\\)2", names(estimates))]
# If you want to get both factors you just skip the 2
estimates[grep("^factor\\(g\\)", names(estimates))]
# This case does not really require fancy
# regular expressions so you could write
estimates[grep("factor(g)", names(estimates), fixed=TRUE)]
# This comes much more in handy when you have a more complex situtation where
# coefficients have similar names
data <- data.frame(x=rnorm(50), great_g_var=runif(50), y=rnorm(50),
g_var=factor(rep(1:3,length.out=50)),
g_var2=factor(sample(1:3,size=50, replace=TRUE)))
model <- lm(y ~ x + great_g_var + g_var + g_var2, data=data)
estimates <- coef(model)
# Now if you want to do a simple fixed grep you could end up
# with unexpected estimates
estimates[grep("g_var", names(estimates), fixed=TRUE)]
# Returns:
# great_g_var g_var2 g_var3 g_var22 g_var23
# -0.361707955 -0.058988495 0.010967326 -0.008952616 -0.297461520
# Therefore you may want to use regular expressions, here's how you select g_var
estimates[grep("^g_var[0-9]$", names(estimates))]
# Returns:
# g_var2 g_var3
# -0.05898849 0.01096733
# And if you want to have g_var2 you write:
estimates[grep("^g_var2[0-9]$", names(estimates))]
# Returns:
# g_var22 g_var23
# -0.008952616 -0.297461520