Question

我是chi平方测试的新手，并试图找出＆＃39;标准＆＃39;在两个实验中，运行卡方检验并获得成功率差异的95％置信区间。

我的数据如下：

Condition A:    25           75           100
Condition B:    100          100          200
Total:          125          175

这些数字代表实验过程中观察到的数量。如您所见，条件A与条件B的样本数量不同。

我想得到的是：

一项测试统计数据，表明条件A的成功率33％是否与条件B的成功率50％有统计学差异。
我也希望两种成功率之间的差异达到95％的置信区间。

似乎scipy.stats.chisquare期望用户调整预期的＆＃39;计数，使它们看起来像被观察到的样本大小一样取出＆＃39;计数。这是我需要做的唯一转变吗？如果没有，我还需要做什么？最后，我将如何计算比例差异的95％置信区间？

Answer 1

你有contingency table。要对此数据执行χ²测试，您可以使用scipy.stats.chi2_contingency：

In [31]: from scipy.stats import chi2_contingency

In [32]: obs = np.array([[25, 75], [100, 100]])

In [33]: obs
Out[33]: 
array([[ 25,  75],
       [100, 100]])

In [34]: chi2, p, dof, expected = chi2_contingency(obs)

In [35]: p
Out[35]: 5.9148695289823149e-05

您的列联表是2x2，因此您可以使用Fisher's exact test。这在scipy中实现为scipy.stats.fisher_exact：

In [148]: from scipy.stats import fisher_exact

In [149]: oddsr, pval = fisher_exact(obs)

In [150]: pval
Out[150]: 3.7175015403965242e-05

对于列联表，scipy没有更多的东西。看起来statsmodels的下一个版本将有更多的工具来分析列联表，但现在没有用。

编写一些代码来计算比例差异及其95％置信区间并不难。这是一种方式：

# Include this if you are using Python 2.7.  Or tweak the code in the
# function to ensure that division uses floating point.
from __future__ import division


def diffprop(obs):
    """
    `obs` must be a 2x2 numpy array.

    Returns:
    delta
        The difference in proportions
    ci
        The Wald 95% confidence interval for delta
    corrected_ci
        Yates continuity correction for the 95% confidence interval of delta.
    """
    n1, n2 = obs.sum(axis=1)
    prop1 = obs[0,0] / n1
    prop2 = obs[1,0] / n2
    delta = prop1 - prop2

    # Wald 95% confidence interval for delta
    se = np.sqrt(prop1*(1 - prop1)/n1 + prop2*(1 - prop2)/n2)
    ci = (delta - 1.96*se, delta + 1.96*se)

    # Yates continuity correction for confidence interval of delta
    correction = 0.5*(1/n1 + 1/n2)
    corrected_ci = (ci[0] - correction, ci[1] + correction)

    return delta, ci, corrected_ci

例如，

In [22]: obs
Out[22]: 
array([[ 25,  75],
       [100, 100]])

In [23]: diffprop(obs)
Out[23]: 
(-0.25,
 (-0.35956733089748971, -0.14043266910251032),
 (-0.36706733089748972, -0.13293266910251031))

返回的第一个值是比例delta的差异。接下来的两对是delta的Wald 95％置信区间，以及Yates连续性修正的Wald 95％置信区间。

如果您不喜欢这些负值，可以先反转行：

In [24]: diffprop(obs[::-1])
Out[24]: 
(0.25,
 (0.14043266910251032, 0.35956733089748971),
 (0.13293266910251031, 0.36706733089748972))

为了比较，这里是R中的类似计算：

> obs
     [,1] [,2]
[1,]   25   75
[2,]  100  100
> prop.test(obs, correct=FALSE)

    2-sample test for equality of proportions without continuity
    correction

data:  obs
X-squared = 17.1429, df = 1, p-value = 3.467e-05
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.3595653 -0.1404347
sample estimates:
prop 1 prop 2 
  0.25   0.50 

> prop.test(obs, correct=TRUE)

    2-sample test for equality of proportions with continuity correction

data:  obs
X-squared = 16.1297, df = 1, p-value = 5.915e-05
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.3670653 -0.1329347
sample estimates:
prop 1 prop 2 
  0.25   0.50

Answer 2

我只想将n1 = float(n1)添加到n2 = float(n2)和<iframe src="http://www.myAngularProjects" frameborder="0" scrolling="no"></iframe>。它们应该被转换为浮点数（对于Python 2用户），否则除法将只产生0。

运行具有观察和期望计数的卡方检验并获得置信区间

2 个答案: