我有一个看起来像这样的广告df:
user_id session_id timestamp
141.0 1.0 20190418 02:23:56.000
141.0 2.0 20190416 19:51:57.000
141.0 3.0 20190415 14:47:53.000
121.0 4.0 20190414 13:57:55.000
121.0 5.0 20190414 06:23:01.000
121.0 6.0 20190412 15:32:57.000
我正在尝试将lambda函数与一个组一起应用,该组将为每个user_id计算从会话时间戳记起的最近24小时内的会话数:
结果应为:
user_id session_id timestamp 24-HourCount
141.0 1.0 20190418 02:23:56.000 0
141.0 2.0 20190416 19:51:57.000 0
141.0 3.0 20190415 14:47:53.000 na
121.0 4.0 20190414 13:57:55.000 3
121.0 5.0 20190414 06:23:01.000 1
121.0 6.0 20190413 15:32:57.000 na
我试图进行分组并计算行数(所有会话都是不同的值),但出现错误。
df['24-HourCount'] = df.groupby('user_id')['timestamp'].transform(lambda x:\
x.between(x.max()- dt.timedelta(days=1),x.max())).count()))
tried also applying the function:
def func(dfx):
k=dfx[dfx.between(dfx[0]-dt.timedelta(days=1),dfx[0])].count()
return(k)
df['24-HourCount']=df.groupby('user_id').apply(func)
谢谢!