Groupby等效于多个列和枢纽中的多个聚合

时间:2019-08-23 12:07:26

标签: python pandas dataframe

首先,这篇文章非常有用:How to pivot a dataframe

现在我有以下目标:

df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
...                          "bar", "bar", "bar", "bar"],
...                    "B": ["one", "one", "one", "two", "two",
...                          "one", "one", "two", "two"],
...                    "C": ["small", "large", "large", "small",
...                          "small", "large", "small", "small",
...                          "large"],
...                    "D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
...                    "E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})
table = pd.pivot_table(df, values=['D', 'E'], index=['A', 'C'],
...                     aggfunc={'D': np.mean,
...                              'E': ['count', max, np.mean]})
flattened = pd.DataFrame(table.to_records())

带有结果和目标:

    A   C   ('D', 'mean')   ('E', 'count')  ('E', 'max')    ('E', 'mean')
0   bar     large   5.500000    2.0     9.0     7.500000
1   bar     small   5.500000    2.0     9.0     8.500000
2   foo     large   2.000000    2.0     5.0     4.500000
3   foo     small   2.333333    3.0     6.0     4.333333

groupby是否有任何等效命令?类似于:df.groupby(['row', 'col'])['val0'].agg(['size', 'mean']).unstack(fill_value=0)

或者这已经是最有效的方法了?

1 个答案:

答案 0 :(得分:2)

Groupby的替代方案是:

df = df.groupby(['A', 'C']).agg({'D': np.mean, 'E': ['count', max, np.mean]})
print (df)
                  D     E              
               mean count max      mean
A   C                                  
bar large  5.500000     2   9  7.500000
    small  5.500000     2   9  8.500000
foo large  2.000000     2   5  4.500000
    small  2.333333     3   6  4.333333


df.columns = df.columns.map('_'.join)
print (df)
             D_mean  E_count  E_max    E_mean
A   C                                        
bar large  5.500000        2      9  7.500000
    small  5.500000        2      9  8.500000
foo large  2.000000        2      5  4.500000
    small  2.333333        3      6  4.333333