Question

我有以下数据框，其中包含来自不同用户的尝试支出（或交易），每次尝试都有日期和金额。

user  date amount
1     1    6    
1     2    5    
1     3    2    
1     4    3    
1     5    1    
2     1    11    
2     2    12    
2     3    5    
2     4    8    
2     5    1

比方说，我想对支出总额施加一个任意限制，并检查通过了哪些交易（因为用户未超过该限制），而哪些交易没有，假设该限制为10。理想的结果将是：

user  date amount approved spent remaining_credit
1     1    6      1        6     4
1     2    5      0        6     4
1     3    2      1        8     2
1     4    3      0        8     2
1     5    1      1        9     1
2     1    11     0        0     10
2     2    12     0        0     10
2     3    5      1        5     5
2     4    8      0        5     5
2     5    1      1        6     4

无论如何计算最后3列中的任何一列都可以解决我的问题。
每次操作量小于限制值减去预先花费的金额之和时，第一个（批准的列号4）将为1。
第二个（已花费）具有已批准交易的累计支出。
第三次（remaing_credit）拥有每次尝试支出后的剩余信用额。
我尝试过：

d1['spent'] = d1.sort_values('date').groupby('user')['amount'].cumsum()
d1['spent'] = d1.sort_values(['user','date']).spent.mask(d1.spent > limit).fillna(method='pat')

但是当不再次超过限制时，我不知道如何重新启动累积金额。

Answer 1

这可以通过创建自己的函数来完成，在函数中您将遍历数据以创建每一列，然后groupby.apply：

def calcul_spendings (ser, val_max=1):
    arr_am = ser.to_numpy()
    arr_sp = np.cumsum(arr_am)
    arr_ap = np.zeros(len(ser))
    for i in range(len(arr_am)):
        if arr_sp[i]>val_max: # check if the 
            arr_sp[i:] -= arr_am[i]
        else:
            arr_ap[i] = 1
    return pd.DataFrame({'approved':arr_ap, 
                         'spent': arr_sp, 
                         'remaining_credit':val_max-arr_sp}, 
                        index=ser.index)

df[['approved','spent','remaining_credit']] = df.sort_values('date').groupby('user')['amount'].apply(calcul_spendings, val_max=10)
print (df)
   user  date  amount  approved  spent  remaining_credit
0     1     1       6       1.0      6                 4
1     1     2       5       0.0      6                 4
2     1     3       2       1.0      8                 2
3     1     4       3       0.0      8                 2
4     1     5       1       1.0      9                 1
5     2     1      11       0.0      0                10
6     2     2      12       0.0      0                10
7     2     3       5       1.0      5                 5
8     2     4       8       0.0      5                 5
9     2     5       1       1.0      6                 4

中断条件累积总和熊猫python

1 个答案: