我有一个数据框,其中有4列User_id,Transaction_id,product和datetime。对于每个用户,我必须选择他最近的n次交易,假设n = 2,我的数据帧如下:
transaction_id user_id product date
T1 U1 P1 2019-03-27
T1 U1 P2 2019-03-27
T1 U1 P3 2019-03-27
T2 U1 P2 2019-03-21
T2 U1 P3 2019-03-21
T3 U1 P2 2019-03-20
我尝试借助此group by pandas dataframe and select latest in each group
来做到这一点我期望的输出是:
transaction_id user_id product date
T1 U1 P1 2019-03-27
T1 U1 P2 2019-03-27
T1 U1 P3 2019-03-27
T2 U1 P2 2019-03-21
T2 U1 P3 2019-03-21
答案 0 :(得分:1)
想法是首先通过DataFrame.drop_duplicates
删除重复项,获取每个组的top2值和DataFrame.merge
原始DataFrame:
df = (df.merge(df.drop_duplicates(['user_id','date'])
.sort_values('date',ascending = False)
.groupby('user_id')
.head(2)[['user_id','date']])
)
print (df)
transaction_id user_id product date
0 T1 U1 P1 2019-03-27
1 T1 U1 P2 2019-03-27
2 T1 U1 P3 2019-03-27
3 T2 U1 P2 2019-03-21
4 T2 U1 P3 2019-03-21