根据其他列值生成一列

时间:2019-11-04 21:03:51

标签: python pandas numpy

我有一个数据框:

Date_1      Date_2     is_B weight_1
01/09/2019  02/08/2019  1   254
01/09/2019  02/08/2019  1   320
01/09/2019  04/08/2019  1   244
01/09/2019  04/08/2019  1   247
01/09/2019  14/08/2019  0   343
01/09/2019  14/08/2019  1   161
01/09/2019  14/08/2019  1   386
01/09/2019  15/08/2019  1   465
01/09/2019  15/08/2019  1   133
01/09/2019  15/08/2019  1   310
01/09/2019  15/08/2019  1   155

我想生成一列new_weight,以便对于每个date_1,new_weight的值为5000-weight_1,其中is_B值为1。如果is_B = 0,则将较旧的new_weight值复制到new_weight中。

我知道要计算new_weight,我们可以做到:

df = 5000 - df.groupby('date_1')['weight_1'].cumsum()

但是我不确定如何在代码中应用is_b的条件。

有人能建议用熊猫还是麻木的方式做同样的事吗?

编辑

预期输出

Date_1      Date_2     is_B weight_1  new_weight
01/09/2019  02/08/2019  1   254       5000-254
01/09/2019  02/08/2019  1   320       5000-254-320 
01/09/2019  04/08/2019  1   244       5000-254-320-244
01/09/2019  04/08/2019  1   247       5000-254-320-244-247
01/09/2019  14/08/2019  0   343       5000-254-320-244-247(we won't subtract 343 as isBooked = 0)
01/09/2019  14/08/2019  1   161       .
01/09/2019  14/08/2019  1   386       . 
01/09/2019  15/08/2019  1   465       . 
01/09/2019  15/08/2019  1   133       .
01/09/2019  15/08/2019  1   310       .
01/09/2019  15/08/2019  1   155       .

谢谢

4 个答案:

答案 0 :(得分:1)

尝试一下:

df['new_weight'] = df.groupby('date_1').apply(lambda grp:
    5000 - grp.weight_1.where(grp.isBooked.eq(1), 0).cumsum()).reset_index(level=0, drop=True)

答案 1 :(得分:1)

看来您只需要在groupby之前进行简单的乘法即可:

df['new_weight'] = 5000 - (df['weight_1'].mul(df['is_B'])
     .groupby(df['Date_1'])
     .cumsum()
)

输出:

        Date_1      Date_2  is_B  weight_1  new_weight
0   01/09/2019  02/08/2019     1       254        4746
1   01/09/2019  02/08/2019     1       320        4426
2   01/09/2019  04/08/2019     1       244        4182
3   01/09/2019  04/08/2019     1       247        3935
4   01/09/2019  14/08/2019     0       343        3935
5   01/09/2019  14/08/2019     1       161        3774
6   01/09/2019  14/08/2019     1       386        3388
7   01/09/2019  15/08/2019     1       465        2923
8   01/09/2019  15/08/2019     1       133        2790
9   01/09/2019  15/08/2019     1       310        2480
10  01/09/2019  15/08/2019     1       155        2325

答案 2 :(得分:1)

您可以使用DataFrame.mask + Series.cumsum

df['new_weight']=5000-(df.mask(df['is_B'].eq(0)).groupby('Date_1')['weight_1'].cumsum()).ffill()
print(df)

        Date_1      Date_2  is_B  weight_1  new_weight
0   01/09/2019  02/08/2019     1       254      4746.0
1   01/09/2019  02/08/2019     1       320      4426.0
2   01/09/2019  04/08/2019     1       244      4182.0
3   01/09/2019  04/08/2019     1       247      3935.0
4   01/09/2019  14/08/2019     0       343      3935.0
5   01/09/2019  14/08/2019     1       161      3774.0
6   01/09/2019  14/08/2019     1       386      3388.0
7   01/09/2019  15/08/2019     1       465      2923.0
8   01/09/2019  15/08/2019     1       133      2790.0
9   01/09/2019  15/08/2019     1       310      2480.0
10  01/09/2019  15/08/2019     1       155      2325.0

答案 3 :(得分:0)

这将在新列(“ New_weight”)中为您提供所需的值:

df.loc[df.is_B == 0, 'new_weight'] = df['weight_1']
df.loc[df.is_B == 1, 'new_weight'] = 5000 - df.groupby('Date_1')['weight_1'].cumsum()

不确定这是否回答“如果is_B = 0,那么我们会将旧值new_weight复制到new_weight中。”