循环为熊猫列

时间:2016-10-14 21:29:55

标签: python loops pandas

我想对几个列应用kruskal测试。我跟贝拉一样

for n, i in enumerate(deviceInfo):
    if i.name == 'br':
        i = i.contents
        deviceInfo[n] = i

然后循环

import pandas as pd
import scipy 
df = pd.DataFrame({'a':range(9), 'b':[1,2,3,1,2,3,1,2,3], 'group':['a', 'b', 'c']*3})

我得到了

groups = {}
res = []
for grp in df['group'].unique():
    for column in df[[0, 1]]:
        groups[grp] = df[column][df['group']==grp].values
    args = groups.values()
g = scipy.stats.kruskal(*args)
res.append(g)
print (res) 

但我想要

[KruskalResult(statistic=8.0000000000000036, pvalue=0.018315638888734137)]

我的错误在哪里?

对于单个列,我执行如下

[KruskalResult(statistic=0.80000000000000071, pvalue=0.67032004603563911)]
[KruskalResult(statistic=8.0000000000000036, pvalue=0.018315638888734137)]

2 个答案:

答案 0 :(得分:1)

你的for循环是颠倒的:单列算法是关于你选择的列的循环不变量。因此,循环列必须是外循环。用简单的英语“对于每一列应用kruskal算法,该算法由这个group.unique for循环组成:

groups = {}
res = []
for column in df[[0, 1]]:
    for grp in df['group'].unique():
        groups[grp] = df[column][df['group']==grp].values
    args = groups.values()
    g = scipy.stats.kruskal(*args)
    res.append(g)
print (res) 

答案 1 :(得分:0)

在我这样做之前

groups = {}
res = []
for column in df[[0, 1]]:
    for grp in df['group'].unique():
        groups[grp] = df[column][df['group']==grp].values
    args = groups.values()
g = scipy.stats.kruskal(*args)
res.append(g)
print (res)

我得到了

[KruskalResult(statistic=8.0000000000000036, pvalue=0.018315638888734137)]

问题在于缩进((