我有一个大型数据集(长度= 454605),如下所示:
ID Se Min Va
1 1 1 2
1 1 1 2
1 1 1 3
- - - -
24 4 26 8
24 4 26 8
24 4 26 4
- - - -
55 6 40 2
55 6 40 0
55 6 40 0
ID = participant number, ranging from 1 - 55
Se = task session, ranging from 1 - 6
Min = time counter during each task session, ranging from 1 - 40
Va = performance value for each task undertaken
我需要为每个参与者平均每个会话的每分钟的性能值。请问最好的方法是什么?
答案 0 :(得分:1)
每个会话的每分钟按['Min', 'Se', 'ID']
组分组“
每个参与者“:
grouped = df.groupby(['Min', 'Se', 'ID'])
要查找每个组的平均效果,请计算
grouped.mean()
import numpy as np
import pandas as pd
np.random.seed(2015)
df = pd.DataFrame(np.random.randint(10, size=(10,4)),
columns=['Min', 'Se', 'ID','Va'])
grouped = df.groupby(['Min', 'Se', 'ID'])
print(grouped.mean())
产量
Va
Min Se ID
0 6 7 8
1 2 3 3
2 2 9 6
3 1 2
3 8 6 9
9 3 1
5 8 4 8
6 2 9 8
8 5 7 8
9 1 2 2