Question

我正在对三个都是二元的因子进行逻辑回归。

我的数据

   table1<-expand.grid(Crime=factor(c("Shoplifting","Other Theft Acts")),Gender=factor(c("Men","Women")),
    Priorconv=factor(c("N","P")))
    table1<-data.frame(table1,Yes=c(24,52,48,22,17,60,15,4),No=c(1,9,3,2,6,34,6,3))

和模型

fit4<-glm(cbind(Yes,No)~Priorconv+Crime+Priorconv:Crime,data=table1,family=binomial)
summary(fit4)

对于犯罪入店行窃，R似乎对先前的定罪P和1采取1。因此，如果上述两者都是1，则交互效应仅为1.我现在想尝试交互项的不同组合，例如我想看看如果先前的定罪是P 将会是什么样的犯罪不是入店行窃。

有没有办法让R为1和0采用不同的情况？这将极大地促进我的分析。

谢谢。

Answer 1

您已经在回归中获得了两个分类变量的所有四种组合。你可以看到如下：

这是您的回归输出：

Call:
glm(formula = cbind(Yes, No) ~ Priorconv + Crime + Priorconv:Crime, 
    family = binomial, data = table1)

Coefficients:
                            Estimate Std. Error z value Pr(>|z|)    
(Intercept)                   1.9062     0.3231   5.899 3.66e-09 ***
PriorconvP                   -1.3582     0.3835  -3.542 0.000398 ***
CrimeShoplifting              0.9842     0.6069   1.622 0.104863    
PriorconvP:CrimeShoplifting  -0.5513     0.7249  -0.761 0.446942

因此，对于Priorconv，引用类别（虚拟值= 0的引用类别）为N。对于Crime，参考类别为Other。所以这里是如何解释四种可能性中每一种的回归结果（其中log（p /（1-p））是Yes结果的几率的对数）：

1. PriorConv = N and Crime = Other. This is just the case where both dummies are 
    zero, so your regression is just the intercept:

log(p/(1-p)) = 1.90

2. PriorConv = P and Crime = Other. So the Priorconv dummy equals 1 and the 
   Crime dummy is still zero:

log(p/(1-p)) = 1.90 - 1.36

3. PriorConv = N and Crime = Shoplifting. So the Priorconv dummy is 0 and the 
   Crime dummy is now 1:

log(p/(1-p)) = 1.90 + 0.98

4. PriorConv = P and Crime = Shoplifting. Now both dummies are 1:

log(p/(1-p)) = 1.90 - 1.36 + 0.98 - 0.55

您可以对两个预测变量的因子值进行重新排序，但这只会改变上述四种情况中的变量组合。

更新：关于回归系数相对于因子排序的问题。改变参考水平将改变系数，因为系数将代表不同类别组合之间的对比，但它不会改变Yes或No结果的预测概率。（如果您只是通过更改参考类别来更改预测，则回归建模将不可信。）请注意，即使我们切换{{1}的参考类别，预测的概率也是相同的。 }：

Priorconv

Answer 2

我同意@ eipi10提供的解释。您还可以在拟合模型之前使用relevel更改参考级别：

levels(table1$Priorconv)
## [1] "N" "P"

table1$Priorconv <- relevel(table1$Priorconv, ref = "P")
levels(table1$Priorconv)
## [1] "P" "N"

m <- glm(cbind(Yes, No) ~ Priorconv*Crime, data = table1, family = binomial)
summary(m)

请注意，我更改了formula的{{1}}参数，以包含更紧凑的glm()。

R中Logistic回归的虚拟变量

2 个答案: