>>> df
group valueCol
0 1 1
1 1 2
2 1 3
3 2 4
4 2 5
5 3 6
>>> df.dtypes
group int64
valueCol int64
dtype: object
>>>
这是有道理的:
>>> df.groupby('group')['valueCol'].agg({'mean': np.mean, 'sum': sum, 'len': len})
sum mean len
group
1 6 2.0 3
2 9 4.5 2
3 6 6.0 1
这没有意义。我希望通过调整看到不同的列值。但它始终是dict-comprehension的最后一个值,它被复制到所有列中。这是预期的吗?
>>> df.groupby('group')['valueCol'].agg({'adjust-by-' + str(diff): lambda x: len(x) + diff for diff in [0, 1, 2]})
adjust-by-0 adjust-by-1 adjust-by-2
group
1 5 5 5
2 4 4 4
3 3 3 3
>>> df.groupby('group')['valueCol'].agg({'adjust-by-' + str(diff): lambda x: len(x) + diff for diff in [2, 1, 0]})
adjust-by-0 adjust-by-1 adjust-by-2
group
1 3 3 3
2 2 2 2
3 1 1 1
>>>
编辑:
我希望定义不同的lambda函数,但它们都是一样的。
>>> functions = {i: lambda x: len(x) + i for i in [0, 1, 2]}
>>> functions
{0: <function <dictcomp>.<lambda> at 0x107447510>, 1: <function <dictcomp>.<lambda> at 0x1074472f0>, 2: <function <dictcomp>.<lambda> at 0x1074471e0>}
>>> df.groupby('group')['valueCol'].agg(lambda x: functions[1](x))
group
1 5
2 4
3 3
Name: valueCol, dtype: int64
>>> df.groupby('group')['valueCol'].agg(lambda x: functions[2](x))
group
1 5
2 4
3 3
Name: valueCol, dtype: int64
这很有效。我想我的lambda函数生成器是错误的。
>>> def crtfunc(i):
return lambda x: len(x) + i
>>> crtfunc(2)
<function crtfunc.<locals>.<lambda> at 0x1074477b8>
>>> crtfunc(2)([1,2,3])
5
>>> functions3 = {i: crtfunc(i) for i in [0, 1, 2]}
>>> functions3
{0: <function crtfunc.<locals>.<lambda> at 0x1074479d8>, 1: <function
crtfunc.<locals>.<lambda> at 0x107447840>, 2: <function
crtfunc.<locals>.<lambda> at 0x107447620>}
>>> df.groupby('group')['valueCol'].agg(functions3[0])
group
1 3
2 2
3 1
Name: valueCol, dtype: int64
>>> df.groupby('group')['valueCol'].agg(functions3[2])
group
1 5
2 4
3 3
Name: valueCol, dtype: int64
>>> df.groupby('group')['valueCol'].agg({'adjust-by'+str(i): functions3[i] for i in [0, 1, 2]})
adjust-by0 adjust-by2 adjust-by1
group
1 3 5 4
2 2 4 3
3 1 3 2