理解在groupby聚合中对整个数组进行操作的函数

时间:2014-12-17 12:38:47

标签: arrays numpy pandas group-by aggregate

import numpy as np
import pandas as pd

df = pd.DataFrame({
    'clients': pd.Series(['A', 'A', 'A', 'B', 'B']),
    'odd1': pd.Series([1, 1, 2, 1, 2]),
    'odd2': pd.Series([6, 7, 8, 9, 10])})

grpd = df.groupby(['clients', 'odd1']).agg({
    'odd2': lambda x: x/float(x.sum())
})
print grpd

期望的结果是:

A   1   0.619047619
    2   0.380952381
B   1   0.473684211
    2   0.526316

我浏览了around,但我仍然不明白如何在整个阵列上运行lambda,f.ex。 x.sum()工作。此外,我仍然忽略xx.sum() wrt对分组列的内容。

1 个答案:

答案 0 :(得分:3)

你可以这样做:

>>> df.groupby(['clients', 'odd1'])['odd2'].sum() / df.groupby('clients')['odd2'].sum()
clients  odd1
A        1       0.619
         2       0.381
B        1       0.474
         2       0.526
Name: odd2, dtype: float64

或者,使用.transform根据clients分组获取值,然后对每个clientsodd1分组求和:

>>> df['val'] = df['odd2'] / df.groupby('clients')['odd2'].transform('sum')
>>> df
  clients  odd1  odd2    val
0       A     1     6  0.286
1       A     1     7  0.333
2       A     2     8  0.381
3       B     1     9  0.474
4       B     2    10  0.526
>>> df.groupby(['clients', 'odd1'])['val'].sum()
clients  odd1
A        1       0.619
         2       0.381
B        1       0.474
         2       0.526
Name: val, dtype: float64