pandas groupby with lambda expression and dict comprehension

时间:2016-01-13 09:11:06

标签: python pandas lambda aggregate

>>> df
   group  valueCol
0      1         1
1      1         2
2      1         3
3      2         4
4      2         5
5      3         6
>>> df.dtypes
group       int64
valueCol    int64
dtype: object
>>> 

这是有道理的:

>>> df.groupby('group')['valueCol'].agg({'mean': np.mean, 'sum': sum, 'len': len})
       sum  mean  len
group                
1        6   2.0    3
2        9   4.5    2
3        6   6.0    1

这没有意义。我希望通过调整看到不同的列值。但它始终是dict-comprehension的最后一个值,它被复制到所有列中。这是预期的吗?

>>> df.groupby('group')['valueCol'].agg({'adjust-by-' + str(diff): lambda x: len(x) + diff for diff in [0, 1, 2]})
       adjust-by-0  adjust-by-1  adjust-by-2
group                                       
1                5            5            5
2                4            4            4
3                3            3            3
>>> df.groupby('group')['valueCol'].agg({'adjust-by-' + str(diff): lambda x: len(x) + diff for diff in [2, 1, 0]})
       adjust-by-0  adjust-by-1  adjust-by-2
group                                       
1                3            3            3
2                2            2            2
3                1            1            1
>>> 

编辑:

我希望定义不同的lambda函数,但它们都是一样的。

>>> functions = {i: lambda x: len(x) + i for i in [0, 1, 2]}
>>> functions
{0: <function <dictcomp>.<lambda> at 0x107447510>, 1: <function <dictcomp>.<lambda> at 0x1074472f0>, 2: <function <dictcomp>.<lambda> at 0x1074471e0>}
>>> df.groupby('group')['valueCol'].agg(lambda x: functions[1](x))
group
1    5
2    4
3    3
Name: valueCol, dtype: int64
>>> df.groupby('group')['valueCol'].agg(lambda x: functions[2](x))
group
1    5
2    4
3    3
Name: valueCol, dtype: int64

这很有效。我想我的lambda函数生成器是错误的。

>>> def crtfunc(i):
           return lambda x: len(x) + i

>>> crtfunc(2)
<function crtfunc.<locals>.<lambda> at 0x1074477b8>

>>> crtfunc(2)([1,2,3])
5

>>> functions3 = {i: crtfunc(i) for i in [0, 1, 2]}
>>> functions3
{0: <function crtfunc.<locals>.<lambda> at 0x1074479d8>, 1: <function
crtfunc.<locals>.<lambda> at 0x107447840>, 2: <function
crtfunc.<locals>.<lambda> at 0x107447620>}

>>> df.groupby('group')['valueCol'].agg(functions3[0])
group
1    3
2    2
3    1
Name: valueCol, dtype: int64

>>> df.groupby('group')['valueCol'].agg(functions3[2])
group
1    5
2    4
3    3
Name: valueCol, dtype: int64

>>> df.groupby('group')['valueCol'].agg({'adjust-by'+str(i): functions3[i] for i in [0, 1, 2]})

       adjust-by0  adjust-by2  adjust-by1
group
1               3           5           4
2               2           4           3
3               1           3           2

0 个答案:

没有答案