Python Pandas将列添加到多索引GroupBy DataFrame

时间:2017-05-16 20:46:42

标签: python pandas dataframe group-by

我正在尝试使用多索引向Pandas GroupBy DataFrame添加列。该列是分组后公共密钥的最大值和平均值之间的差值。

这是输入DataFrame:

   Main  Reads  Test  Subgroup
0     1      5    54         1
1     2      2    55         1
2     1     10    56         2
3     2     20    57         3
4     1      7    58         3

以下是代码:

import numpy as np
import pandas as pd

df = pd.DataFrame({'Main': [1, 2, 1, 2, 1], 'Reads': [5, 2, 10, 20, 7],\
                   'Test':range(54,59), 'Subgroup':[1,1,2,3,3]})

result = df.groupby(['Main','Subgroup']).agg({'Reads':[np.max,np.mean]})

在执行result

的计算之前,这是变量diff
              Reads     
               amax mean
Main Subgroup           
1    1            5    5
     2           10   10
     3            7    7
2    1            2    2
     3           20   20

接下来,我使用以下内容计算diff

result['Reads']['diff'] = result['Reads']['amax'] - result['Reads']['mean']

但这是输出:

/home/userd/test.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/
...stable/indexing.html#indexing-view-versus-copy
...result['Reads']['diff'] = result['Reads']['amax'] - result['Reads']['mean']

我希望diff列与amaxmean位于同一级别。

有没有办法在Pandas中的多索引GroupBy()对象的最里面(底部)列索引中添加一列?

3 个答案:

答案 0 :(得分:3)

您可以使用元组

访问多索引
result[('Reads','diff')] = result[('Reads','amax')] - result[('Reads','mean')]

你得到了

                    Reads
                    amax    mean    diff
Main    Subgroup            
1       1           5       5       0
        2          10      10       0
        3           7       7       0
2       1           2       2       0
        3          20      20       0

答案 1 :(得分:2)

试试这个:

In [8]: result = df.groupby(['Main','Subgroup']).agg({'Reads':[np.max,np.mean, lambda x: x.max()-x.mean()]})

In [9]: result
Out[9]:
              Reads
               amax mean <lambda>
Main Subgroup
1    1            5    5        0
     2           10   10        0
     3            7    7        0
2    1            2    2        0
     3           20   20        0

In [10]: result = result.rename(columns={'<lambda>':'diff'})

In [11]: result
Out[11]:
              Reads
               amax mean diff
Main Subgroup
1    1            5    5    0
     2           10   10    0
     3            7    7    0
2    1            2    2    0
     3           20   20    0

答案 2 :(得分:2)

#you can you lambda to build diff directly.
df.groupby(['Main','Subgroup']).agg({'Reads':[np.max,np.mean,lambda x: np.max(x)-np.mean(x)]}).rename(columns={'<lambda>':'diff'})
Out[2360]: 
              Reads          
               amax mean diff
Main Subgroup                
1    1            5    5    0
     2           10   10    0
     3            7    7    0
2    1            2    2    0
     3           20   20    0