如何进行F检验以检查Python中两个向量的方差是否相等?
例如,如果我有
a = [1,2,1,2,1,2,1,2,1,2]
b = [1,3,-1,2,1,5,-1,6,-1,2]
有类似于
的东西scipy.stats.ttest_ind(a, b)
我找到了
sp.stats.f(a, b)
但它似乎与F-test不同
答案 0 :(得分:31)
对于等方差的检验统计量F检验很简单:
F = Var(X) / Var(Y)
F
分配为df1 = len(X) - 1, df2 = len(Y) - 1
scipy.stats.f
有一个CDF方法。这意味着您可以为给定的统计信息生成p值,并测试该p值是否大于您选择的alpha级别。
因此:
alpha = 0.05 #Or whatever you want your alpha to be.
p_value = scipy.stats.f.cdf(F, df1, df2)
if p_value > alpha:
# Reject the null hypothesis that Var(X) == Var(Y)
请注意,F测试对X和Y的非正态性非常敏感,因此除非您是Levene's test或Bartlett's test,否则最好还是进行更强大的测试,例如Bartlett's test或Levene's test合理地确定X和Y是正常分布的。这些测试可以在scipy
api:
答案 1 :(得分:5)
要做单向anova,你可以使用
import scipy.stats as stats
stats.f_oneway(a,b)
Anova检查组之间的方差是否大于组内方差的一种方法,并计算使用F分布观察此方差比的概率。可以在这里找到一个很好的教程:
答案 2 :(得分:4)
对于那些来到这里寻找ANOVA F测试或比较模型进行特征选择的人
sklearn.feature_selection.f_classif
进行ANOVA测试,sklearn.feature_selection.f_regression
会对回归进行顺序测试答案 3 :(得分:1)
这是一个使用 Python 和 SciPy 计算单边或双边 F 检验的简单函数。结果已根据 R 中 var.test()
函数的输出进行了检查。请记住其他答案中提到的关于 F 检验对非正态性的敏感性的警告。
import scipy.stats as st
def f_test(x, y, alt="two_sided"):
"""
Calculates the F-test.
:param x: The first group of data
:param y: The second group of data
:param alt: The alternative hypothesis, one of "two_sided" (default), "greater" or "less"
:return: a tuple with the F statistic value and the p-value.
"""
df1 = len(x) - 1
df2 = len(y) - 1
f = x.var() / y.var()
if alt == "greater":
p = 1.0 - st.f.cdf(f, df1, df2)
elif alt == "less":
p = st.f.cdf(f, df1, df2)
else:
# two-sided by default
# Crawley, the R book, p.355
p = 2.0*(1.0 - st.f.cdf(f, df1, df2))
return f, p
答案 4 :(得分:0)
如果需要进行两尾测试,则可以按以下步骤进行操作,我选择了alpha = 0.05:
a = [1,2,1,2,1,2,1,2,1,2]
b = [1,3,-1,2,1,5,-1,6,-1,2]
print('Variance a={0:.3f}, Variance b={1:.3f}'.format(np.var(a, ddof=1), np.var(b, ddof=1)))
fstatistics = np.var(a, ddof=1)/np.var(b, ddof=1) # because we estimate mean from data
fdistribution = stats.f(len(a)-1,len(b)-1) # build an F-distribution object
p_value = 2*min(fdistribution.cdf(f_critical), 1-fdistribution.cdf(f_critical))
f_critical1 = fdistribution.ppf(0.025)
f_critical2 = fdistribution.ppf(0.975)
print(fstatistics,f_critical1, f_critical2 )
if (p_value<0.05):
print('Reject H0', p_value)
else:
print('Cant Reject H0', p_value)
如果要进行方差分析(ANOVA)测试,其中只有大的值才能导致拒绝,可以进行右尾测试,您需要注意方差的顺序(fstatistics = var1 / var2或var2 / var1 ):
a = [1,2,1,2,1,2,1,2,1,2]
b = [1,3,-1,2,1,5,-1,6,-1,2]
print('Variance a={0:.3f}, Variance b={1:.3f}'.format(np.var(a, ddof=1), np.var(b, ddof=1)))
fstatistics = max(np.var(a, ddof=1), np.var(b, ddof=1))/min(np.var(a, ddof=1), np.var(b, ddof=1)) # because we estimate mean from data
fdistribution = stats.f(len(a)-1,len(b)-1) # build an F-distribution object
p_value = 1-fdistribution.cdf(fstatistics)
f_critical = fd.ppf(0.95)
print(fstatistics, f_critical)
if (p_value<0.05):
print('Reject H0', p_value)
else:
print('Cant Reject H0', p_value)
左尾可以完成以下操作:
a = [1,2,1,2,1,2,1,2,1,2]
b = [1,3,-1,2,1,5,-1,6,-1,2]
print('Variance a={0:.3f}, Variance b={1:.3f}'.format(np.var(a, ddof=1), np.var(b, ddof=1)))
fstatistics = min(np.var(a, ddof=1), np.var(b, ddof=1))/max(np.var(a, ddof=1), np.var(b, ddof=1)) # because we estimate mean from data
fdistribution = stats.f(len(a)-1,len(b)-1) # build an F-distribution object
p_value = fdistribution.cdf(fstatistics)
f_critical = fd.ppf(0.05)
print(fstatistics, f_critical)
if (p_value<0.05):
print('Reject H0', p_value)
else:
print('Cant Reject H0', p_value)