假设我有
| channel | sum | txn | value | count | group
0 | A | null | null | 2 | 1 | 1
1 | A | null | null | 3 | 3 | 1
2 | B | null | null | 4 | 4 | 2
3 | C | null | null | 2 | 2 | 2
4 | A | null | null | 1 | 5 | 1
当我使用
df.loc[df['group'] == 1 ,['sum','txn']] = df.loc[df['group'] == 1].groupby(['channel'])['value','count'].apply(lambda x: x+1)
它没有为我的数据框赋值
应该看起来像这样
| channel | sum | txn | value | count | group
0 | A | 3 | 2 | 2 | 1 | 1
1 | A | 4 | 4 | 3 | 3 | 1
2 | B | null | null | 4 | 4 | 2
3 | C | null | null | 2 | 2 | 2
4 | A | 2 | 6 | 1 | 5 | 1
答案 0 :(得分:2)
这里的值不是每组的计数,因此应通过省略groupby
来简化解决方案,因为必须将正确的对齐值转换为numpy数组:
m = df['group'] == 1
df.loc[m ,['sum','txn']] = (df.loc[m, ['value','count']] + 1).to_numpy()
#oldier pandas versions
#df.loc[m ,['sum','txn']] = (df.loc[m, ['value','count']] + 1).values
print (df)
channel sum txn value count group
0 A 3.0 2.0 2 1 1
1 A 4.0 4.0 3 3 1
2 B NaN NaN 4 4 2
3 C NaN NaN 2 2 2
4 A 2.0 6.0 1 5 1
编辑:要对每个组进行规范化,可以使用GroupBy.transform
:
m = df['group'] == 1
df.loc[m ,['sum','txn']] = (df[m].groupby('channel')['value','count']
.transform(lambda x: x - x.max() / x.max() - x.min())
.to_numpy())
print (df)
channel sum txn value count group
0 A 0.0 -1.0 2 1 1
1 A 1.0 1.0 3 3 1
2 B NaN NaN 4 4 2
3 C NaN NaN 2 2 2
4 A -1.0 3.0 1 5 1