根据数据框中的约束计算列

时间:2019-10-24 06:32:39

标签: python pandas numpy

我有一个数据框

I_Code  Date_1   Date_2     C_B
4   01/09/2019  02/08/2019  7
4   01/09/2019  03/08/2019  1189
1   02/09/2019  03/08/2019  3
2   02/09/2019  03/08/2019  4
1   01/09/2019  04/08/2019  8
4   01/09/2019  04/08/2019  1650
1   02/09/2019  04/08/2019  138
2   02/09/2019  04/08/2019  8
1   03/09/2019   04/08/2019 6
2   03/09/2019  04/08/2019  2
1   01/09/2019  05/08/2019  9
4   01/09/2019  05/08/2019  1805
1   02/09/2019  05/08/2019  259
2   02/09/2019  05/08/2019  1062
1   03/09/2019  05/08/2019  4
2   03/09/2019  05/08/2019  161
2   04/09/2019  05/08/2019  7
5   04/09/2019  05/08/2019  12
1   01/09/2019  06/08/2019  63
2   01/09/2019  06/08/2019  2

我想计算为每个date_1和I_code计算的列c_b1,这样对于每个date_1和I_Code我想为下一个条目减去C_b。例如-

I_Code  Date_1   Date_2        C_B    C_B1
    4   01/09/2019  02/08/2019  7      7
    4   01/09/2019  03/08/2019  1189   1189-7 (Since date_1 and I_Code are same in entry 1)
    1   02/09/2019  03/08/2019  3       3
    2   02/09/2019  03/08/2019  4       4
    1   01/09/2019  04/08/2019  8       8
    4   01/09/2019  04/08/2019  1650    1650-1189 (Since date_1 and I_Code are same in entry 2)
    1   02/09/2019  04/08/2019  138     138-3
    2   02/09/2019  04/08/2019  8       8-3

任何人都可以帮助实现相同目标。

谢谢

2 个答案:

答案 0 :(得分:2)

使用DataFrameGroupBy.shift,将缺失值替换为0并减去:

df['C_B1'] = df['C_B'] - df.groupby(['I_Code','Date_1'])['C_B'].shift().fillna(0)

使用Series.subfill_value=0参数的替代方法:

df['C_B1'] = df['C_B'].sub(df.groupby(['I_Code','Date_1'])['C_B'].shift(), fill_value=0)

print (df)
    I_Code      Date_1      Date_2   C_B    C_B1
0        4  01/09/2019  02/08/2019     7     7.0
1        4  01/09/2019  03/08/2019  1189  1182.0
2        1  02/09/2019  03/08/2019     3     3.0
3        2  02/09/2019  03/08/2019     4     4.0
4        1  01/09/2019  04/08/2019     8     8.0
5        4  01/09/2019  04/08/2019  1650   461.0
6        1  02/09/2019  04/08/2019   138   135.0
7        2  02/09/2019  04/08/2019     8     4.0
8        1  03/09/2019  04/08/2019     6     6.0
9        2  03/09/2019  04/08/2019     2     2.0
10       1  01/09/2019  05/08/2019     9     1.0
11       4  01/09/2019  05/08/2019  1805   155.0
12       1  02/09/2019  05/08/2019   259   121.0
13       2  02/09/2019  05/08/2019  1062  1054.0
14       1  03/09/2019  05/08/2019     4    -2.0
15       2  03/09/2019  05/08/2019   161   159.0
16       2  04/09/2019  05/08/2019     7     7.0
17       5  04/09/2019  05/08/2019    12    12.0
18       1  01/09/2019  06/08/2019    63    54.0
19       2  01/09/2019  06/08/2019     2     2.0

答案 1 :(得分:1)

IIUC:

df["C_B_1"] = df.groupby(["Date_1","I_Code"])["C_B"].apply(lambda x: x-x.shift()).fillna(df['C_B'])

print (df)
#
    I_Code      Date_1      Date_2   C_B   C_B_1
0        4  01/09/2019  02/08/2019     7     7.0
1        4  01/09/2019  03/08/2019  1189  1182.0
2        1  02/09/2019  03/08/2019     3     3.0
3        2  02/09/2019  03/08/2019     4     4.0
4        1  01/09/2019  04/08/2019     8     8.0
5        4  01/09/2019  04/08/2019  1650   461.0
6        1  02/09/2019  04/08/2019   138   135.0
7        2  02/09/2019  04/08/2019     8     4.0
8        1  03/09/2019  04/08/2019     6     6.0
9        2  03/09/2019  04/08/2019     2     2.0
10       1  01/09/2019  05/08/2019     9     1.0
11       4  01/09/2019  05/08/2019  1805   155.0
12       1  02/09/2019  05/08/2019   259   121.0
13       2  02/09/2019  05/08/2019  1062  1054.0
14       1  03/09/2019  05/08/2019     4    -2.0
15       2  03/09/2019  05/08/2019   161   159.0
16       2  04/09/2019  05/08/2019     7     7.0
17       5  04/09/2019  05/08/2019    12    12.0
18       1  01/09/2019  06/08/2019    63    54.0
19       2  01/09/2019  06/08/2019     2     2.0