Question

数据框：

ID spend month_diff    
12  10    -5
12  10    -4
12  10    -3
12  10     1         
12  10    -2         
12  20     0        
12  30     2         
12  10    -1

我想根据特定ID的月份差异获得spend_total。 month_diff为负数表示客户在去年完成的支出，而正数表示今年为支出。因此，我想比较过去一年和今年的客户支出。因此条件如下：条件：

if month_diff >= -2 and < 0然后是负数月的累积支出-> flag=pre
if month_diff > 0 and <=2，然后累积正数月的累计支出-> flag=post

注意：没有。 month_diff +ve和-ve中的。可能是这样的情况：客户在-ve month_diff中进行了4笔交易，而在+ve上只有2笔交易，因此我只想获取-ve month_diff的2个月累积金额，而+ve则需要2笔累积金额并且不想考虑month_diff为0时的支出。

所需的数据帧：

ID spend month_diff spend_tot   flag    
12  10    -2         20         pre
12  30     2         40        post

40是month_diff +1和+2（即10 + 30）的累计支出总和，与month_diff -1和-2相同的支出，其累计支出为20（即10 + 10）

Answer 1

使用：

#filter values by list
df = df[df['month_diff'].isin([1,2,-1,-2])]

#filter duplicated values with absolute values of month_diff
df = df[df.assign(a=df['month_diff'].abs()).duplicated(['ID','a'], keep=False)]
#sign column
a = np.sign(df['month_diff'])
#aggregate sum and last
df1 = (df.groupby(['ID', a])
         .agg({'month_diff':'last', 'spend':'sum'})
         .reset_index(level=1, drop=True)
         .reset_index())

df1['flag'] = np.select([df1['month_diff'].ge(-2) & df1['month_diff'].lt(0),
                         df1['month_diff'].gt(0) & df1['month_diff'].le(2)], 
                         ['pre','post'], default='another val')
print (df1)
   ID  month_diff  spend  flag
0  12          -1     20   pre
1  12           2     40  post

基于月份差异的组总数

1 个答案: