我正在尝试从头开始在python中重新创建一个双尾一个样本t-test以加深我的理解,我似乎有一些数据样本的工作代码,但我发现一个例子与输出不匹配来自scipy.ttest_1samp,我正在试图找出原因。
t统计匹配,但我得到不同的p值。我的t.cdf函数是否有问题导致错误的p值?
我的代码:
sample = [10.81261135, 9.68035252, 9.87293556, 10.06308861,
9.57381722, 10.00922156, 10.90522431, 9.70843104,
10.16614481, 10.09447189, 10.51260742, 10.17503686,
10.38718472, 10.52334431, 9.55808306, 10.24290938,
10.6048062 , 10.27535938, 9.6329808 , 9.67338239]
mu = 7.128061097
sam_mean = np.mean(sample)
sam_std = np.std(sample, ddof=1)
n = len(sample)
df = n-1
t = (sam_mean-mu) / (sam_std / (n**(1/2.)))
p = (scs.t.cdf(t,df))*2
return (t,p)
我的结果:
(32.369715406889142, 2.0)
scipy.ttest_1samp的结果:
Ttest_1sampResult(statistic=32.369715406889142, pvalue=4.3828444145707213e-18)
答案 0 :(得分:1)
替换
p = (scs.t.cdf(t,df))*2
与
p = (scs.t.sf(abs(t),df))*2
或
p = min(scs.t.cdf(t,df), scs.t.sf(t, df))*2
t.sf(x, df)
是survival function(即1 - t.cdf(x, df)
)。