如何为特定索引的数据帧分配具有相同索引的数据帧值

时间:2020-02-25 08:26:54

标签: python pandas dataframe

假设我有

    | channel |   sum   |  txn  | value | count | group
0   |    A    |  null   |  null |   2   |   1   |  1
1   |    A    |  null   |  null |   3   |   3   |  1 
2   |    B    |  null   |  null |   4   |   4   |  2
3   |    C    |  null   |  null |   2   |   2   |  2
4   |    A    |  null   |  null |   1   |   5   |  1

当我使用

df.loc[df['group'] == 1 ,['sum','txn']] = df.loc[df['group'] == 1].groupby(['channel'])['value','count'].apply(lambda x: x+1)

它没有为我的数据框赋值

应该看起来像这样

    | channel |   sum   |  txn  | value | count | group
0   |    A    |    3    |   2   |   2   |   1   |  1
1   |    A    |    4    |   4   |   3   |   3   |  1 
2   |    B    |  null   |  null |   4   |   4   |  2
3   |    C    |  null   |  null |   2   |   2   |  2
4   |    A    |    2    |   6   |   1   |   5   |  1

1 个答案:

答案 0 :(得分:2)

这里的值不是每组的计数,因此应通过省略groupby来简化解决方案,因为必须将正确的对齐值转换为numpy数组:

m = df['group'] == 1
df.loc[m ,['sum','txn']] = (df.loc[m, ['value','count']] + 1).to_numpy()
#oldier pandas versions
#df.loc[m ,['sum','txn']] = (df.loc[m, ['value','count']] + 1).values
print (df)
  channel  sum  txn  value  count  group
0       A  3.0  2.0      2      1      1
1       A  4.0  4.0      3      3      1
2       B  NaN  NaN      4      4      2
3       C  NaN  NaN      2      2      2
4       A  2.0  6.0      1      5      1

编辑:要对每个组进行规范化,可以使用GroupBy.transform

m = df['group'] == 1
df.loc[m ,['sum','txn']] = (df[m].groupby('channel')['value','count']
                                 .transform(lambda x: x - x.max() / x.max() - x.min())
                                 .to_numpy())

print (df)
  channel  sum  txn  value  count  group
0       A  0.0 -1.0      2      1      1
1       A  1.0  1.0      3      3      1
2       B  NaN  NaN      4      4      2
3       C  NaN  NaN      2      2      2
4       A -1.0  3.0      1      5      1
相关问题