偏移代码

Question

我正在尝试捕获文档中某个主题的显着性或主导性。显着性的量度是该主题上的单词数。但是，我需要控制每个文档的单词数量不同的事实。（TOTAL_WORDS，平均值= 2,444个字，标准差= 1,379个字，最小值= 561，最大值= 8,342个字，范围= 7,781个字）。如果我使用负二项式模型（glm.nb），Total_Words应该是偏移量还是权重？其次，如果我使用Total_Words作为偏移量，那么它是否像Poisson回归一样是偏移量的对数？

我尝试运行带有偏移量或权重的模型，但得到的结果却大不相同，只有在使用权重时，我的系数才具有统计意义。我查看了该软件包的文档，并说：“对于二项式GLM，当响应是成功的比例时，先验权重用于给出试验次数”。这是否意味着在我的情况下权重会被接受？

偏移代码

summary(m1 <- glm.nb(Problem_Demand ~  HEALTH_CJ + offset(log(`TOTAL WORDS`))))

重量代码

summary(m2 <- glm.nb(Problem_Demand ~  HEALTH_CJ, weights=Dissertation_Dataset$`TOTAL WORDS`))

抵消结果：

Call:
glm.nb(formula = Problem_Demand ~ HEALTH_CJ + 
    offset(log(`TOTAL WORDS`)), init.theta = 0.1490825725, 
    link = log)

残差：

    Min        1Q    Median        3Q       Max  
-1.55538  -1.41229  -0.45314   0.00276   1.87925

系数：

                                              Estimate Std. Error z value Pr(>|z|)    
(Intercept)                              -2.5384     0.2897  -8.762   <2e-16 

HEALTH_CJLaw Enforcement                  -0.6883     0.4796  -1.435    0.151    

HEALTH_CJOther                             0.3187     0.6031   0.529    0.597    

(Dispersion parameter for Negative Binomial(0.1491) family taken to be 1)

    Null deviance: 154.04  on 149  degrees of freedom
Residual deviance: 151.23  on 147  degrees of freedom
AIC: 1400

Number of Fisher Scoring iterations: 1


              Theta:  0.1491 
          Std. Err.:  0.0183 

 2 x log-likelihood:  -1391.9620

重量结果：

Call:
glm.nb(formula = Problem_Demand ~ HEALTH_CJ, 
    weights = `TOTAL WORDS`, init.theta = 0.1458893113, 
    link = log)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-121.467   -62.381   -21.260    -3.179   108.458  

Coefficients:
                                               Estimate Std. Error z value Pr(>|z|)    
(Intercept)                          5.297791   0.005737  923.48   <2e-16

 HEALTH_CJLaw Enforcement            -1.163340   0.009350 -124.42   <2e-16 

HEALTH_CJOther                       0.529726   0.014012   37.81   <2e-16 


(Dispersion parameter for Negative Binomial(0.1459) family taken to be 1)

    Null deviance: 391806  on 149  degrees of freedom
Residual deviance: 373685  on 147  degrees of freedom
AIC: 3483728

Number of Fisher Scoring iterations: 1


              Theta:  0.145889 
          Std. Err.:  0.000362 

 2 x log-likelihood:  -3483720.172000

R中的GLM.NB软件包中的偏移量和权重之间有什么区别？

偏移代码

重量代码

0 个答案: