用于多列条件平均的pandas或numpy

时间:2015-08-03 02:19:14

标签: python numpy pandas

我有一个大型数据集(长度= 454605),如下所示:

ID  Se  Min Va
1   1   1   2
1   1   1   2
1   1   1   3
-   -   -   -
24  4   26  8
24  4   26  8
24  4   26  4
-   -   -   -
55  6   40  2
55  6   40  0
55  6   40  0

ID = participant number, ranging from 1 - 55
Se = task session, ranging from 1 - 6
Min = time counter during each task session, ranging from 1 - 40
Va = performance value for each task undertaken

我需要为每个参与者平均每个会话的每分钟的性能值。请问最好的方法是什么?

1 个答案:

答案 0 :(得分:1)

每个会话的每分钟按['Min', 'Se', 'ID']组分组“ 每个参与者“:

grouped = df.groupby(['Min', 'Se', 'ID'])

要查找每个组的平均效果,请计算

grouped.mean()
import numpy as np
import pandas as pd
np.random.seed(2015)
df = pd.DataFrame(np.random.randint(10, size=(10,4)), 
                  columns=['Min', 'Se', 'ID','Va'])

grouped = df.groupby(['Min', 'Se', 'ID'])
print(grouped.mean())

产量

           Va
Min Se ID    
0   6  7    8
1   2  3    3
2   2  9    6
    3  1    2
3   8  6    9
    9  3    1
5   8  4    8
6   2  9    8
8   5  7    8
9   1  2    2