似然比检验:'模型并非都适合相同大小的数据集'

时间:2017-11-09 04:36:39

标签: r testing regression

我是一个绝对的R初学者,需要一些帮助,我的似然比测试用于我的单变量分析。这是代码:

#Univariate analysis for conscientiousness (categorical)
fit <- glm(BCS_Bin~Conscientiousness_cat,data=dat,family=binomial)
summary(fit)

#Likelihood ratio test
fit0<-glm(BCS_Bin~1, data=dat, family=binomial)
summary(fit0)
lrtest(fit, fit0)

结果是:

Call:
glm(formula = BCS_Bin ~ Conscientiousness_cat, family = binomial, 
data = dat)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.8847  -0.8847  -0.8439   1.5016   1.5527  

Coefficients:
                         Estimate Std. Error z value Pr(>|z|)    
(Intercept)              -0.84933    0.03461 -24.541   <2e-16 ***
Conscientiousness_catLow  0.11321    0.05526   2.049   0.0405 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 7962.1  on 6439  degrees of freedom
Residual deviance: 7957.9  on 6438  degrees of freedom
(1963 observations deleted due to missingness)
AIC: 7961.9

Number of Fisher Scoring iterations: 4

Call:
glm(formula = BCS_Bin ~ 1, family = binomial, data = dat)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.8524  -0.8524  -0.8524   1.5419   1.5419  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.82535    0.02379  -34.69   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 10251  on 8337  degrees of freedom
Residual deviance: 10251  on 8337  degrees of freedom
(65 observations deleted due to missingness)
AIC: 10253

Number of Fisher Scoring iterations: 4

对于我的LRT:

Error in lrtest.default(fit, fit0) : 
models were not all fitted to the same size of dataset

我知道这种情况正在发生,因为缺少不同数量的观察结果?这是因为它是来自大型调查问卷的数据,并且与结果变量(身体状况评分/ BCS)相比,通过评估我的预测变量(尽责性)的问题发生了更多的辍学。因此,我只是为BCS提供了比尽责更多的数据(例如,它也为我的许多其他变量产生了相同的错误)。

1 个答案:

答案 0 :(得分:0)

为了运行似然比检验,只有截距的模型必须与包含Conscientiousness_cat的模型相同。因此,您需要Conscientiousness_cat没有缺失值的数据子集:

BCS_bin_subset = BCS_bin[complete.cases(BCS_bin[,"Conscientiousness_cat"]), ]

您可以在此数据子集上运行两个模型,并且似然比测试应该无误地运行。

在您的情况下,您也可以这样做:

BCS_bin_subset = BCS_bin[!is.na(BCS_bin$Conscientiousness_cat), ]

但是,如果您希望数据框的子集在多个变量中没有缺失值,那么complete.cases很方便。

如果您要运行多个模型,那么更方便的另一个选项,但更复杂的是首先适合任何模型使用来自BCS_bin的最大数量的变量(因为该模型将排除由于缺失而观察到的最大数量)然后使用update函数将该模型更新为具有较少变量的模型。我们只需要确保update每次都使用相同的观察,我们使用下面定义的包装函数。以下是使用内置mtcars数据框的示例:

library(lmtest)

dat = mtcars

# Create some missing values in mtcars
dat[1, "wt"] = NA
dat[5, "cyl"] = NA
dat[7, "hp"] = NA

# Wrapper function to ensure the same observations are used for each 
#  updated model as were used in the first model
# From https://stackoverflow.com/a/37341927/496488
update_nested <- function(object, formula., ..., evaluate = TRUE){
  update(object = object, formula. = formula., data = object$model, ..., evaluate = evaluate)
}

m1 = lm(mpg ~ wt + cyl + hp, data=dat)
m2 = update_nested(m1, . ~ . - wt)  # Remove wt
m3 = update_nested(m1, . ~ . - cyl) # Remove cyl
m4 = update_nested(m1, . ~ . - wt - cyl) # Remove wt and cyl
m5 = update_nested(m1, . ~ . - wt - cyl - hp) # Remove all three variables (i.e., model with intercept only)

lrtest(m5,m4,m3,m2,m1)