根据分组条件过滤前n行

时间:2019-03-27 09:46:49

标签: python pandas-groupby

我有一个数据框,其中有4列User_id,Transaction_id,product和datetime。对于每个用户,我必须选择他最近的n次交易,假设n = 2,我的数据帧如下:

    transaction_id  user_id  product  date
         T1             U1     P1     2019-03-27
         T1             U1     P2     2019-03-27
         T1             U1     P3     2019-03-27
         T2             U1     P2     2019-03-21
         T2             U1     P3     2019-03-21
         T3             U1     P2     2019-03-20

我尝试借助此group by pandas dataframe and select latest in each group

来做到这一点

我期望的输出是:

   transaction_id   user_id  product  date
        T1            U1       P1     2019-03-27
        T1            U1       P2     2019-03-27
        T1            U1       P3     2019-03-27
        T2            U1       P2     2019-03-21
        T2            U1       P3     2019-03-21

1 个答案:

答案 0 :(得分:1)

想法是首先通过DataFrame.drop_duplicates删除重复项,获取每个组的top2值和DataFrame.merge原始DataFrame:

df = (df.merge(df.drop_duplicates(['user_id','date'])
                 .sort_values('date',ascending = False)
                 .groupby('user_id')
                 .head(2)[['user_id','date']])
       )
print (df)
  transaction_id user_id product       date
0             T1      U1      P1 2019-03-27
1             T1      U1      P2 2019-03-27
2             T1      U1      P3 2019-03-27
3             T2      U1      P2 2019-03-21
4             T2      U1      P3 2019-03-21