嵌套GLM的卡方检验

时间:2019-05-26 10:12:31

标签: r statistics glm microsoft-r

我正在使用revoScaleR包来创建各种嵌套的GLM。我想对成对的嵌套模型进行卡方检验,以帮助我评估拟合优度。

有人可以告诉我我该怎么做吗?

下面的代码示例:

rm(list = ls())

library(RevoScaleR)
library(fueleconomy) # provides the "vehicles" data frame

### Produce a GLM using the built in glm() function

glm_1 <- glm(hwy ~ 1 + year + class + trans + 
          drive + cyl + displ + fuel, data = vehicles)
AIC1 <- glm_1$aic 

### Produce a second GLM using the built in glm() function - but remove "fuel" from model

glm_2 <- glm(hwy ~ 1 + year + class + trans + 
          drive + cyl + displ, data = vehicles)
AIC2 <- glm_2$aic

### Produce same models using rxGlm()

vehicles$class <- as.factor(vehicles$class)
vehicles$trans <- as.factor(vehicles$trans)
vehicles$drive <- as.factor(vehicles$drive)
vehicles$fuel <- as.factor(vehicles$fuel)

rxGlm_1 <- rxGlm(hwy ~ 1 + year + class + trans + drive + cyl + displ + fuel, data = vehicles, computeAIC = TRUE)
rxAIC1 <- rxGlm_1$aic[1]

rxGlm_2 <- rxGlm(hwy ~ 1 + year + class + trans + drive + cyl + displ, data = vehicles, computeAIC = TRUE)
rxAIC2 <- rxGlm_2$aic[1]

### play with anova() function on the model objects created using glm()

anova(glm_1, test = "Chisq") # works ok
anova(glm_2, test = "Chisq") # works ok

anova(glm_1, glm_2, test = "Chisq") # works ok and I think this is then a chi squared test for the two (nested) models :-)

### play with anova() function on the model objects created using rxGlm()
### anova() can't accept an rxGlm() model object so try to convert using as.glm() function...?

anova(as.glm(rxGlm_1), test = "Chisq") # doesn't work - "Error in `contrasts<-`(`*tmp*`, ncol(ca), value = ca) : wrong number of contrast matrix rows"
anova(as.glm(rxGlm_2), test = "Chisq") # doesn't work - "Error in call(if (is.function(method)) "method" else method, x = x[, varseq <=  : first argument must be a character string"

anova(as.glm(rxGlm_1), as.glm(rxGlm_2), test = "Chisq") # doesn't work - "Error in qr.lm(object) : lm object does not have a proper 'qr' component. Rank zero or should not have used lm(.., qr=FALSE)."

您可能想知道为什么我同时使用glm()rxGlm()创建GLM。答案是,我经常需要将GLM适应大量的客户数据集-使用rxGlm()包中的revoScaleR有巨大的性能优势。

str(as.glm(rxGlm_1))产生:

> str(as.glm(rxGlm_1))
List of 10
 $ coefficients: Named num [1:99] -6.12804 0.00472 0.01101 -0.03666 -0.0033 ...
  ..- attr(*, "names")= chr [1:99] "(Intercept)" "year" "classLarge Cars" "classMidsize-Large Station Wagons" ...
 $ rank        : int 99
 $ df.residual : num 33285
 $ contrasts   :List of 4
  ..$ class: chr "contr.treatment"
  ..$ trans: chr "contr.treatment"
  ..$ drive: chr "contr.treatment"
  ..$ fuel : chr "contr.treatment"
 $ xlevels     :List of 4
  ..$ class: chr [1:34] "Compact Cars" "Large Cars" "Midsize-Large Station Wagons" "Midsize Cars" ...
  ..$ trans: chr [1:47] "Auto (AV-S6)" "Auto (AV-S8)" "Auto (AV)" "Auto(A1)" ...
  ..$ drive: chr [1:7] "2-Wheel Drive" "4-Wheel Drive" "4-Wheel or All-Wheel Drive" "All-Wheel Drive" ...
  ..$ fuel : chr [1:13] "CNG" "Diesel" "Electricity" "Gasoline or E85" ...
 $ call        : language glm(formula = hwy ~ 1 + year + class + trans + drive + cyl + displ + fuel, data = vehicles, family = poisson(), d| __truncated__
 $ terms       :Classes 'terms', 'formula'  language hwy ~ 1 + year + class + trans + drive + cyl + displ + fuel
  .. ..- attr(*, "variables")= language list(hwy, year, class, trans, drive, cyl, displ, fuel)
  .. ..- attr(*, "factors")= int [1:8, 1:7] 0 1 0 0 0 0 0 0 0 0 ...
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:8] "hwy" "year" "class" "trans" ...
  .. .. .. ..$ : chr [1:7] "year" "class" "trans" "drive" ...
  .. ..- attr(*, "term.labels")= chr [1:7] "year" "class" "trans" "drive" ...
  .. ..- attr(*, "order")= int [1:7] 1 1 1 1 1 1 1
  .. ..- attr(*, "intercept")= int 1
  .. ..- attr(*, "response")= int 1
  .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
  .. ..- attr(*, "dataClasses")= Named chr [1:9] "numeric" "numeric" "factor" "numeric" ...
  .. .. ..- attr(*, "names")= chr [1:9] "hwy" "year" "class" "trans" ...
 $ iter        : int 4
 $ deviance    : num 8366
 $ family      :List of 12
  ..$ family    : chr "poisson"
  ..$ link      : chr "log"
  ..$ linkfun   :function (mu)  
  ..$ linkinv   :function (eta)  
  ..$ variance  :function (mu)  
  ..$ dev.resids:function (y, mu, wt)  
  ..$ aic       :function (y, n, mu, wt, dev)  
  ..$ mu.eta    :function (eta)  
  ..$ initialize:  expression({  if (any(y < 0))  stop("negative values not allowed for the 'Poisson' family")  n <- rep.int(1, nobs| __truncated__
  ..$ validmu   :function (mu)  
  ..$ valideta  :function (eta)  
  ..$ simulate  :function (object, nsim)  
  ..- attr(*, "class")= chr "family"
 - attr(*, "class")= chr [1:2] "glm" "lm"
> 

0 个答案:

没有答案