我有一个要使用多个列进行分组的数据框,然后根据分组添加计算列(平均值)。有人可以帮我吗?
我已经尝试了分组,但是效果很好,但是添加计算列(滚动平均值)实在太麻烦了
import pandas as pd
import numpy as np
df = pd.DataFrame([[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16], list('AAAAAAAABBBBBBBB'), ['RED','BLUE','GREEN','YELLOW','RED','BLUE','GREEN','YELLOW','RED','BLUE','GREEN','YELLOW','RED','BLUE','GREEN','YELLOW'], ['1','1','1','1','2','2','2','2','1','1','1','1','2','2','2','2'],[100,112,99,120,105,114,100,150,200,134,167,150,134,189,172,179]]).T
df.columns = ['id','Station','Train','month_code','total']
df2 = df.groupby(['Station','Train','month_code','total']).size().reset_index().groupby(['Station','Train','month_code'])['total'].max()
希望获得与以下类似的结果
Station Train month_code total average
A BLUE 1 112
2 114 113
GREEN 1 99 106.5
2 100 99.5
RED 1 100 100
2 105 102.5
YELLOW 1 120 112.5
2 150 135
B BLUE 1 134 142
2 189 161.5
GREEN 1 167 178
2 172 169.5
RED 1 200 186
2 134 167
YELLOW 1 150 142
2 179 164.5
答案 0 :(得分:0)
您如何更改初始的groupby
以保留列名'total'
。
df3 = df.groupby(['Station','Train','month_code']).sum()
>>> df3.head()
id total
Station Train month_code
A BLUE 1 2 112
2 6 114
GREEN 1 3 99
2 7 100
RED 1 1 100
然后在total
列上进行滚动平均。
df3['average'] = df3['total'].rolling(2).mean()
>>> df3.head()
id total average
Station Train month_code
A BLUE 1 2 112 NaN
2 6 114 113.0
GREEN 1 3 99 106.5
2 7 100 99.5
RED 1 1 100 100.0
如果不想,您仍然可以删除id列。