在python中R data.chisq $残差的等价物是什么?

时间:2013-12-08 13:04:57

标签: python r scipy

我有以下数据:

array([[33, 250, 196, 136, 32],
       [55, 293, 190,  71, 13]])

我可以从stats.chi2_contingency(data)获得p值。

是否有与此R对象类似的内容 - data.chisq$residuals来获取Pearson的残差和标准化残差?

1 个答案:

答案 0 :(得分:5)

你必须分开计算。这是一个简短的模块 定义这些残差的函数。他们采取观察到的频率和 预期频率(由chi2_contingency返回)。请注意,虽然chi2_contingency和以下residuals函数适用于n维数组,但此处实现的stdres仅适用于2D数组。

from __future__ import division

import numpy as np
from scipy.stats.contingency import margins


def residuals(observed, expected):
    return (observed - expected) / np.sqrt(expected)

def stdres(observed, expected):
    n = observed.sum()
    rsum, csum = margins(observed)
    v = csum * rsum * (n - rsum) * (n - csum) / n**3
    return (observed - expected) / np.sqrt(v)

根据您的数据,我们得到:

>>> F = np.array([[33, 250, 196, 136, 32], [55, 293, 190, 71, 13]])

>>> chi2, p, dof, expected = chi2_contingency(F)

>>> residuals(F, expected)
array([[-1.77162519, -1.61362277, -0.05718356,  2.96508777,  1.89079393],
       [ 1.80687785,  1.64573143,  0.05832142, -3.02408853, -1.92841787]])

>>> stdres(F, expected)
array([[-2.62309082, -3.0471942 , -0.09791681,  4.6295814 ,  2.74991911],
       [ 2.62309082,  3.0471942 ,  0.09791681, -4.6295814 , -2.74991911]])

这是R中用于比较的计算:

> F <- as.table(rbind(c(33, 250, 196, 136, 32), c(55, 293, 190, 71, 13)))

> result <- chisq.test(F)

> result$residuals
            A           B           C           D           E
A -1.77162519 -1.61362277 -0.05718356  2.96508777  1.89079393
B  1.80687785  1.64573143  0.05832142 -3.02408853 -1.92841787

> result$stdres
            A           B           C           D           E
A -2.62309082 -3.04719420 -0.09791681  4.62958140  2.74991911
B  2.62309082  3.04719420  0.09791681 -4.62958140 -2.74991911