Question

我正在尝试编写自己的Python代码来计算一个和两个尾部独立t检验的t统计量和p值。我可以使用正态近似，但目前我正在尝试使用t分布。我在测试数据上匹配SciPy统计库的结果时没有成功。我可以用一双新眼睛看看我是不是只是在某个地方犯了一个蠢话。

请注意，这是cross-posted from Cross-Validated，因为它在那里已经有一段时间没有响应，所以我认为获得一些软件开发人员的意见也不会有什么坏处。我试图了解我正在使用的算法是否存在错误，这应该会重现SciPy的结果。这是一个简单的算法，所以我很难找到错误的原因。

我的代码：

import numpy as np
import scipy.stats as st

def compute_t_stat(pop1,pop2):

    num1 = pop1.shape[0]; num2 = pop2.shape[0];

    # The formula for t-stat when population variances differ.
    t_stat = (np.mean(pop1) - np.mean(pop2))/np.sqrt( np.var(pop1)/num1 + np.var(pop2)/num2 )

    # ADDED: The Welch-Satterthwaite degrees of freedom.
    df = ((np.var(pop1)/num1 + np.var(pop2)/num2)**(2.0))/(   (np.var(pop1)/num1)**(2.0)/(num1-1) +  (np.var(pop2)/num2)**(2.0)/(num2-1) ) 

    # Am I computing this wrong?
    # It should just come from the CDF like this, right?
    # The extra parameter is the degrees of freedom.

    one_tailed_p_value = 1.0 - st.t.cdf(t_stat,df)
    two_tailed_p_value = 1.0 - ( st.t.cdf(np.abs(t_stat),df) - st.t.cdf(-np.abs(t_stat),df) )    


    # Computing with SciPy's built-ins
    # My results don't match theirs.
    t_ind, p_ind = st.ttest_ind(pop1, pop2)

    return t_stat, one_tailed_p_value, two_tailed_p_value, t_ind, p_ind

更新

在阅读了韦尔奇的t检验之后，我看到我应该使用Welch-Satterthwaite公式来计算自由度。我更新了上面的代码以反映这一点。

随着新的自由度，我得到了更接近的结果。我的双面p值从SciPy版本开始减少了大约0.008 ...但这仍然是一个太大的错误，所以我仍然必须做一些不正确的事情（或者SciPy发行功能非常糟糕，但很难相信它们只精确到2位小数。）

第二次更新：

在继续尝试的同时，我想也许当自由度足够高（大约> 30）时，SciPy的版本会自动计算t分布的法线近似值。所以我使用Normal分布重新运行我的代码，并且计算结果实际上远离SciPy，而不是使用t分布。

加分问题：） （更多统计理论相关;随意忽略）

此外，t统计量为负数。我只是想知道这对于单侧t检验意味着什么。这通常意味着我应该在负轴方向上进行测试吗？在我的测试数据中，人口1是没有接受某种就业培训计划的对照组。人口2确实收到了，测量数据是治疗前后的工资差异。

所以我有理由认为人口2的平均值会更大。但从统计理论的角度来看，以这种方式编制测试似乎并不合适。我怎么能知道在不依赖主观数据知识的情况下检查（单向测试）负面方向？或者这只是那些频繁的事情之一，虽然在哲学上并不严谨，但需要在实践中完成？

Answer 1

通过使用SciPy内置函数source()，我可以看到函数ttest_ind()的源代码的打印输出。基于源代码，SciPy内置执行t检验，假设两个样本的方差相等。它没有使用Welch-Satterthwaite自由度。 SciPy假设方差相等，但没有说明这一假设。

我只想指出，至关重要的是，这就是为什么你不应该只信任库函数。在我的情况下，我确实需要对不等方差的人群进行t检验，对于我将在其中运行的一些较小的数据集，自由度调整可能很重要。

正如我在一些评论中提到的，我的代码与SciPy之间的差异大约为0.008，样本大小在30到400之间，然后对于更大的样本大小慢慢变为零。这是等方差t统计分母中额外（1 / n1 + 1 / n2）项的影响。准确性方面，这非常重要，特别是对于小样本量。它绝对向我证实我需要编写自己的函数。（可能还有其他更好的Python库，但这至少应该是已知的。坦率地说，令人惊讶的是，这不是ttest_ind()的SciPy文档中的前后任何地方。

Answer 2

您不是在计算样本差异，而是使用人口差异。样本差异除以n-1，而不是n。由于与此类似的原因，np.var有一个名为ddof的可选参数。

这应该会给你预期的结果：

import numpy as np
import scipy.stats as st

def compute_t_stat(pop1,pop2):

    num1 = pop1.shape[0]
    num2 = pop2.shape[0];
    var1 = np.var(pop1, ddof=1)
    var2 = np.var(pop2, ddof=1)

    # The formula for t-stat when population variances differ.
    t_stat = (np.mean(pop1) - np.mean(pop2)) / np.sqrt(var1/num1 + var2/num2)

    # ADDED: The Welch-Satterthwaite degrees of freedom.
    df = ((var1/num1 + var2/num2)**(2.0))/((var1/num1)**(2.0)/(num1-1) + (var2/num2)**(2.0)/(num2-1)) 

    # Am I computing this wrong?
    # It should just come from the CDF like this, right?
    # The extra parameter is the degrees of freedom.

    one_tailed_p_value = 1.0 - st.t.cdf(t_stat,df)
    two_tailed_p_value = 1.0 - ( st.t.cdf(np.abs(t_stat),df) - st.t.cdf(-np.abs(t_stat),df) )    


    # Computing with SciPy's built-ins
    # My results don't match theirs.
    t_ind, p_ind = st.ttest_ind(pop1, pop2)

    return t_stat, one_tailed_p_value, two_tailed_p_value, t_ind, p_ind

PS：SciPy是开源的，主要用Python实现。您可以检查ttest_ind的源代码并自己找出错误。

对于奖励方：您不通过查看您的t值来决定单尾测试。你事先用你的假设来决定它。如果你的零假设是平均值，你的另一个假设是第二个平均值更大，那么你的尾巴应该在左边（负面）。因为你的t值的足够小（负）值表明替代假设更可能是真实而不是零假设。

Answer 3

看起来你忘记了你的df的分子** 2。 Welch-Satterthwaite自由度。

df = (np.var(pop1)/num1 + np.var(pop2)/num2)/(   (np.var(pop1)/num1)**(2.0)/(num1-1) +  (np.var(pop2)/num2)**(2.0)/(num2-1) )

应该是：

df = (np.var(pop1)/num1 + np.var(pop2)/num2)**2/(   (np.var(pop1)/num1)**(2.0)/(num1-1) +  (np.var(pop2)/num2)**(2.0)/(num2-1) )

追踪SciPy的`ttest_ind（）`函数所做的假设

3 个答案: