Question

我正在尝试获取GLM中每个协变量的F统计量和p值。在Python中，我使用stats mode.formula.api来执行GLM。

formula = 'PropNo_Pred ~ Geography + log10BMI + Cat_OpCavity + CatLes_neles + CatRural_urban + \
        CatPred_Control + CatNative_Intro + Midpoint_of_study'

mod1 = smf.glm(formula=formula, data=A2, family=sm.families.Binomial()).fit()
mod1.summary()

之后我尝试使用statsmodels.stats中的anova对此模型进行ANOVA测试

table1 = anova_lm(mod3)
print table1

但是我收到一条错误说： 'GLMResults'对象没有属性'ssr'

看起来这个anova_lm函数只适用于线性模型python中是否有一个模块可以对GLM进行anova测试？

Answer 1

不幸的是，没有。但是，您可以通过在每个术语上使用模型的假设检验方法来自己动手。实际上，他们的一些ANOVA methods甚至没有使用属性ssr（这是模型的残差平方和，因此对于二项式GLM来说显然是未定义的）。您可以修改此代码以执行GLM ANOVA。

Answer 2

这是我尝试自己推出的内容。

嵌套模型的F统计量定义为：

(D_s - D_b ) / (addtl_parameters * phi_b)

位置：

D_s是小模型的偏差
D_b是较大（“大”）模型的偏差
addtl_parameters是模型之间的自由度差异。
phi_b是较大模型的色散参数的估计值”

“统计理论说F统计遵循F分布，分子自由度等于添加的参数和分母的自由度等于n - p_b或数字记录减去大模型中的参数数量。”

我们通过以下方式将其翻译为代码：

from scipy import stats

def calculate_nested_f_statistic(small_model, big_model):
    """Given two fitted GLMs, the larger of which contains the parameter space of the smaller, return the F Stat and P value corresponding to the larger model adding explanatory power"""
    addtl_params = big_model.df_model - small_model.df_model
    f_stat = (small_model.deviance - big_model.deviance) / (addtl_params * big_model.scale)
    df_numerator = addtl_params
    # use fitted values to obtain n_obs from model object:
    df_denom = (big_model.fittedvalues.shape[0] - big_model.df_model)
    p_value = stats.f.sf(f_stat, df_numerator, df_denom)
    return (f_stat, p_value)

以下是可重现的示例，紧随statsmodels（https://www.statsmodels.org/stable/glm.html）中的gamma GLM示例：

import numpy as np
import statsmodels.api as sm
data2 = sm.datasets.scotland.load()
data2.exog = sm.add_constant(data2.exog, prepend=False)

big_model = sm.GLM(data2.endog, data2.exog, family=sm.families.Gamma()).fit()
# Drop one covariate (column):
smaller_model = sm.GLM(data2.endog, data2.exog[:, 1:], family=sm.families.Gamma()).fit()

# Using function defined in answer:
calculate_nested_f_statistic(smaller_model, big_model)
# (9.519052917304652, 0.004914748992474178)

来源： https://www.casact.org/pubs/monographs/papers/05-Goldburd-Khare-Tevet.pdf

在python中对GLM进行Anova测试

2 个答案: