我想对几个列应用kruskal测试。我跟贝拉一样
for n, i in enumerate(deviceInfo):
if i.name == 'br':
i = i.contents
deviceInfo[n] = i
然后循环
import pandas as pd
import scipy
df = pd.DataFrame({'a':range(9), 'b':[1,2,3,1,2,3,1,2,3], 'group':['a', 'b', 'c']*3})
我得到了
groups = {}
res = []
for grp in df['group'].unique():
for column in df[[0, 1]]:
groups[grp] = df[column][df['group']==grp].values
args = groups.values()
g = scipy.stats.kruskal(*args)
res.append(g)
print (res)
但我想要
[KruskalResult(statistic=8.0000000000000036, pvalue=0.018315638888734137)]
我的错误在哪里?
对于单个列,我执行如下
[KruskalResult(statistic=0.80000000000000071, pvalue=0.67032004603563911)]
[KruskalResult(statistic=8.0000000000000036, pvalue=0.018315638888734137)]
答案 0 :(得分:1)
你的for循环是颠倒的:单列算法是关于你选择的列的循环不变量。因此,循环列必须是外循环。用简单的英语“对于每一列应用kruskal算法,该算法由这个group.unique for循环组成:
groups = {}
res = []
for column in df[[0, 1]]:
for grp in df['group'].unique():
groups[grp] = df[column][df['group']==grp].values
args = groups.values()
g = scipy.stats.kruskal(*args)
res.append(g)
print (res)
答案 1 :(得分:0)
groups = {}
res = []
for column in df[[0, 1]]:
for grp in df['group'].unique():
groups[grp] = df[column][df['group']==grp].values
args = groups.values()
g = scipy.stats.kruskal(*args)
res.append(g)
print (res)
我得到了
[KruskalResult(statistic=8.0000000000000036, pvalue=0.018315638888734137)]
问题在于缩进((