在R中,在拟合glm之后,您可以获得包含剩余偏差和零偏差的摘要信息,该信息告诉您模型与仅具有截距项的模型相比有多好(例如模型):
model <- glm(formula = am ~ mpg + qsec, data=mtcars, family=binomial)
我们有:
> summary(model)
...
Null deviance: 43.2297 on 31 degrees of freedom
Residual deviance: 7.5043 on 29 degrees of freedom
AIC: 13.504
...
在Matlab中,当您使用fitglm
时,您将返回GeneralizedLinearModel
类的对象,该对象具有包含剩余偏差的Deviance
属性。但是,我找不到与null deviance直接相关的任何内容。计算这个的最简单方法是什么?
示例Matlab代码:
load fisheriris.mat
model = fitglm(meas(:, 1), ismember(species, {'setosa'}), 'Distribution', 'binomial')
产生
model =
Generalized Linear regression model:
logit(y) ~ 1 + x1
Distribution = Binomial
Estimated Coefficients:
Estimate SE tStat pValue
_________________ _________________ _________________ ____________________
(Intercept) 27.8285213954246 4.8275686220899 5.76450042948896 8.19000695766331e-09
x1 -5.17569812610148 0.893399843474784 -5.79326061438645 6.90328570107794e-09
150 observations, 148 error degrees of freedom
Dispersion: 1
Chi^2-statistic vs. constant model: 119, p-value = 9.87e-28
剩余偏差为model.Deviance
:
>> model.Deviance
ans =
71.8363992272217
答案 0 :(得分:2)
我为Matlab编写了一个GLM
类,它给出了完全相同的结果:
例如,对样本数据进行伽马分布的对数链接GLM在R:
中给出了这一点Call:
glm(formula = MilesPerGallon ~ Horsepower + Acceleration + Cylinders,
family = Gamma(link = log), data = data)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.116817 -0.075084 0.004179 0.060545 0.197108
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 4.955205 0.509903 9.718 < 2e-16 ***
Horsepower -0.017605 0.004352 -4.046 5.21e-05 ***
Acceleration -0.026137 0.015540 -1.682 0.0926 .
Cylinders 0.093277 0.054458 1.713 0.0867 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for Gamma family taken to be 0.0133)
Null deviance: 0.388832 on 10 degrees of freedom
Residual deviance: 0.093288 on 7 degrees of freedom
AIC: 64.05
Number of Fisher Scoring iterations: 4
Pearson MSE: 0.008783281
Deviance MSE: 0.008480725
McFadden R^2: 0.7600815
使用该包,这个相同的估计在Matlab中给出了以下结果:
:: convergence in 4 iterations
------------------------------------------------------------------------------------------
dependent: MilesPerGallon
independent: (Intercept),Horsepower,Acceleration,Cylinders
------------------------------------------------------------------------------------------
log(E[MilesPerGallon]) = ß1×(Intercept) + ß2×Horsepower + ß3×Acceleration + ß4×Cylinders
------------------------------------------------------------------------------------------
distribution: GAMMA
link: LOG
weight: -
offset: -
============================================================
Variable Estimate S.E. z-value Pr(>|z|)
============================================================
(Intercept) 4.955 0.510 9.708 0.00000
Horsepower -0.018 0.004 -4.042 0.00005
Acceleration -0.026 0.016 -1.680 0.09290
Cylinders 0.093 0.055 1.711 0.08706
============================================================
Residual deviance: 0.0933 Deviance MSE: 0.0085
Null deviance: 0.3888 Pearson MSE: 0.0088
Dispersion: 0.0133 Deviance IC: 0.1026
McFadden R^2: 0.7601 Residual df: 7.0000
============================================================
大致相同的输出。希望这可以帮助别人。
答案 1 :(得分:0)
如果对fitglm
的调用与表和使用Wilkinson表示法指定的回归一起使用,则生成的GeneralizedLinearModel
对象model
具有允许我们检索用于的表的属性适合模型,响应名称和分布。
由于R的零偏差只是模型与截距拟合的偏差,我们可以通过使用上述信息拟合null_deviance_model
来找到它:
null_deviance_model = model.fit(model.Variables, ...
[model.ResponseName, ' ~ 1'], 'Distribution', model.Distribution.Name);
与R的零偏差由null_deviance_model.Deviance
给出。
我不确定这是否会延伸到使用矩阵和向量进行协变量/响应的回归。