pandas计算列值意味着组和整个数据帧的平均值

时间:2018-05-18 16:10:38

标签: python pandas dataframe pandas-groupby

我有dfdf['period'] = (df['date1'] - df['date2']) / np.timedelta64(1, 'D')

code    y_m        date1        date2         period    
1000    201701    2017-12-10   2017-12-09       1
1000    201701    2017-12-14   2017-12-12       2
1000    201702    2017-12-15   2017-12-13       2
1000    201702    2017-12-17   2017-12-15       2
2000    201701    2017-12-19   2017-12-18       1
2000    201701    2017-12-12   2017-12-10       2
2000    201702    2017-12-11   2017-12-10       1
2000    201702    2017-12-13   2017-12-12       1
2000    201702    2017-12-11   2017-12-10       1

然后groupby codey_m计算date1-date2的平均值,

df_avg_period = df.groupby(['code', 'y_m'])['period'].mean().reset_index(name='avg_period')

code        y_m        avg_period
1000        201701     1.5
1000        201702     2
2000        201701     1.5
2000        201702     1

但我想将df_avg_period转换为一个矩阵,将列code转换为行,将y_m转换为列,例如

      0     1     2             3              
 0   -1     0    201701       201702       
 1   0     1.44  1.44          1.4         
 2   1000  1.75  1.5           2     
 3   2000  1.20  1.5           1

-1表示虚拟值,表示特定code / y_m单元格不存在值或维持矩阵形状; 0表示全部'平均值为codey_mcodey_m的值,例如单元格(1,1)平均period中所有行的df值; (1,2)period201701具有此值的所有行平均y_m df

显然pivot_table使用mean无法提供正确的结果。所以我想知道如何正确实现这一目标?

1 个答案:

答案 0 :(得分:1)

带有pivot_table

margins=True
piv = df.pivot_table(
    index='code', columns='y_m', values='period', aggfunc='mean', margins=True
)
# housekeeping
(piv.reset_index()
    .rename_axis(None, 1)
    .rename({'code' : -1, 'All' : 0}, axis=1)
    .sort_index(axis=1)
)

  -1         0        201701   201702
0    1000  1.750000      1.5      2.0
1    2000  1.200000      1.5      1.0
2     All  1.444444      1.5      1.4