Question

我有一个使用glmnet包制作的逻辑回归模型。我的响应变量被编码为一个因子，我将其称为＆＃34; a＆＃34;和＆＃34; b＆＃34;。

逻辑回归的数学将两个类别中的一个标记为＆＃34; 0＆＃34;另一个是＆＃34; 1＆＃34;。逻辑回归模型的特征系数为正，负或零。如果一个特征＆＃34; f＆＃34;系数为正，则增加＆＃34; f＆＃34;的值。对于测试观察，x增加了模型将x分类为类＃＆＃34; 1＆＃34;的概率。

我的问题是：给定glmnet模型，您如何知道glmnet如何映射您的数据因素标签{＆＃34; a＆＃34;，＆＃34; b＆＃34;}基础数学＆＃39;因子标签{＆＃34; 0＆＃34;，＆＃34; 1＆＃34;}？因为您需要知道正确解释模型的系数。

您可以通过在应用于玩具观察时试验predict函数的输出来手动解决这个问题。但是glmnet隐式处理映射以加速解释过程会很好。

谢谢！

Answer 1

查看?glmnet（https://cran.r-project.org/web/packages/glmnet/glmnet.pdf的第9页）：

y

response variable. ... For family="binomial" should be either a factor
with two levels, or a two-column matrix of counts or proportions (the 
second column is treated as the target class; for a factor, the last
level in alphabetical order is the target class) ...

现在还不清楚吗？如果您的因子级别为"a"和"b"，则"a"编码为0，而"b"编码为1。

这种治疗确实很标准。它与R代码如何自动计算，或者如何自己编码这些因子水平有关。看看：

## automatic coding by R based on alphabetical order
set.seed(0); y1 <- factor(sample(letters[1:2], 10, replace = TRUE))
## manual coding
set.seed(0); y2 <- factor(sample(letters[1:2], 10, replace = TRUE),
                   levels = c("b", "a"))

# > y1
# [1] b a a b b a b b b b
# Levels: a b
# > y2
# [1] b a a b b a b b b b
# Levels: b a

# > levels(y1)
# [1] "a" "b"
# > levels(y2)
# [1] "b" "a"

无论您使用glmnet()还是glm()，都会发生同样的事情。

glmnet：我如何知道我的响应的哪个因子级别在逻辑回归中被编码为1

1 个答案: