R中的GLM.NB软件包中的偏移量和权重之间有什么区别?

时间:2018-12-26 14:32:28

标签: r offset glm

我正在尝试捕获文档中某个主题的显着性或主导性。显着性的量度是该主题上的单词数。但是,我需要控制每个文档的单词数量不同的事实。 (TOTAL_WORDS,平均值= 2,444个字,标准差= 1,379个字,最小值= 561,最大值= 8,342个字,范围= 7,781个字)。如果我使用负二项式模型(glm.nb),Total_Words应该是偏移量还是权重?其次,如果我使用Total_Words作为偏移量,那么它是否像Poisson回归一样是偏移量的对数?

我尝试运行带有偏移量或权重的模型,但得到的结果却大不相同,只有在使用权重时,我的系数才具有统计意义。我查看了该软件包的文档,并说:“对于二项式GLM,当响应是成功的比例时,先验权重用于给出试验次数”。这是否意味着在我的情况下权重会被接受?

偏移代码

summary(m1 <- glm.nb(Problem_Demand ~  HEALTH_CJ + offset(log(`TOTAL WORDS`))))

重量代码

summary(m2 <- glm.nb(Problem_Demand ~  HEALTH_CJ, weights=Dissertation_Dataset$`TOTAL WORDS`))
抵消结果:
Call:
glm.nb(formula = Problem_Demand ~ HEALTH_CJ + 
    offset(log(`TOTAL WORDS`)), init.theta = 0.1490825725, 
    link = log)

残差:

    Min        1Q    Median        3Q       Max  
-1.55538  -1.41229  -0.45314   0.00276   1.87925  

系数:

                                              Estimate Std. Error z value Pr(>|z|)    
(Intercept)                              -2.5384     0.2897  -8.762   <2e-16 

HEALTH_CJLaw Enforcement                  -0.6883     0.4796  -1.435    0.151    

HEALTH_CJOther                             0.3187     0.6031   0.529    0.597    

(Dispersion parameter for Negative Binomial(0.1491) family taken to be 1)

    Null deviance: 154.04  on 149  degrees of freedom
Residual deviance: 151.23  on 147  degrees of freedom
AIC: 1400

Number of Fisher Scoring iterations: 1


              Theta:  0.1491 
          Std. Err.:  0.0183 

 2 x log-likelihood:  -1391.9620 
重量结果:
Call:
glm.nb(formula = Problem_Demand ~ HEALTH_CJ, 
    weights = `TOTAL WORDS`, init.theta = 0.1458893113, 
    link = log)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-121.467   -62.381   -21.260    -3.179   108.458  

Coefficients:
                                               Estimate Std. Error z value Pr(>|z|)    
(Intercept)                          5.297791   0.005737  923.48   <2e-16

 HEALTH_CJLaw Enforcement            -1.163340   0.009350 -124.42   <2e-16 

HEALTH_CJOther                       0.529726   0.014012   37.81   <2e-16 


(Dispersion parameter for Negative Binomial(0.1459) family taken to be 1)

    Null deviance: 391806  on 149  degrees of freedom
Residual deviance: 373685  on 147  degrees of freedom
AIC: 3483728

Number of Fisher Scoring iterations: 1


              Theta:  0.145889 
          Std. Err.:  0.000362 

 2 x log-likelihood:  -3483720.172000 

0 个答案:

没有答案