让我们看一下这个数据框:
df = pandas.DataFrame(json.loads("""[
{"epoch": 1, "phase": "train", "value": 1.0, "name": "kl"},
{"epoch": 1, "phase": "train", "value": 4.0, "name": "kl"},
{"epoch": 1, "phase": "train", "value": 3.0, "name": "kl"},
{"epoch": 1, "phase": "train", "value": 1.0, "name": "kl"},
{"epoch": 1, "phase": "train", "value": 0.0, "name": "ll"},
{"epoch": 1, "phase": "train", "value": 4.0, "name": "ll"},
{"epoch": 1, "phase": "train", "value": 9.0, "name": "ll"},
{"epoch": 1, "phase": "train", "value": 3.0, "name": "ll"},
{"epoch": 1, "phase": "train", "value": 2.0, "name": "loss"},
{"epoch": 1, "phase": "train", "value": 9.0, "name": "loss"},
{"epoch": 1, "phase": "train", "value": 5.0, "name": "loss"},
{"epoch": 1, "phase": "train", "value": 4.0, "name": "loss"},
{"epoch": 1, "phase": "test", "value": 1.0, "name": "kl"},
{"epoch": 1, "phase": "test", "value": 6.0, "name": "kl"},
{"epoch": 1, "phase": "test", "value": 3.0, "name": "kl"},
{"epoch": 1, "phase": "test", "value": 3.0, "name": "kl"},
{"epoch": 1, "phase": "test", "value": 1.0, "name": "ll"},
{"epoch": 1, "phase": "test", "value": 2.0, "name": "ll"},
{"epoch": 1, "phase": "test", "value": 0.0, "name": "ll"},
{"epoch": 1, "phase": "test", "value": 3.0, "name": "ll"},
{"epoch": 1, "phase": "test", "value": 1.0, "name": "loss"},
{"epoch": 1, "phase": "test", "value": 0.0, "name": "loss"},
{"epoch": 1, "phase": "test", "value": 9.0, "name": "loss"},
{"epoch": 1, "phase": "test", "value": 4.0, "name": "loss"}]"""))
打印数据框如下:
epoch phase value name
0 1 train 1.0 kl
1 1 train 4.0 kl
2 1 train 3.0 kl
3 1 train 1.0 kl
4 1 train 0.0 ll
5 1 train 4.0 ll
6 1 train 9.0 ll
7 1 train 3.0 ll
8 1 train 2.0 loss
9 1 train 9.0 loss
10 1 train 5.0 loss
11 1 train 4.0 loss
12 1 test 1.0 kl
13 1 test 6.0 kl
14 1 test 3.0 kl
15 1 test 3.0 kl
16 1 test 1.0 ll
17 1 test 2.0 ll
18 1 test 0.0 ll
19 1 test 3.0 ll
20 1 test 1.0 loss
21 1 test 0.0 loss
22 1 test 9.0 loss
23 1 test 4.0 loss
对于除value
以外的所有相同行,我想计算多个value
上的均值(或stddev):
epoch phase value name
0 1 train 2.25 kl
1 1 train 5.33 ll
2 1 train 5.00 loss
3 1 test 3.25 kl
4 1 test 1.50 ll
5 1 test 3.50 loss
答案 0 :(得分:3)
这应该做到:
df.groupby(['epoch', 'phase', 'name']).mean()
输出:
value
epoch phase name
1 test kl 3.25
ll 1.50
loss 3.50
train kl 2.25
ll 5.33
loss 5.00
答案 1 :(得分:2)
要自动按除value
以外的所有列进行分组:
df.groupby(list(df.columns.drop('value'))).mean()
相同的结果:
value
epoch phase name
1 test kl 3.25
ll 1.50
loss 3.50
train kl 2.25
ll 5.33
loss 5.00