在此表中,我想找到每个用户平均两次操作之间的平均天数。
我的意思是,我想按user_id分组,然后我想直接从每个日期之前的日期减去每个日期(每个用户的天数)。然后找到每位用户的平均天数(每位用户的No_Action天数的平均值)。
+---------+-----------+----------------------+
| User_ID | Action_ID | Action_At |
+---------+-----------+----------------------+
| 1 | 11 | 2019-01-31T23:00:37Z |
+---------+-----------+----------------------+
| 2 | 12 | 2019-01-31T23:11:12Z |
+---------+-----------+----------------------+
| 3 | 13 | 2019-01-31T23:14:53Z |
+---------+-----------+----------------------+
| 1 | 14 | 2019-02-01T00:00:30Z |
+---------+-----------+----------------------+
| 2 | 15 | 2019-02-01T00:01:03Z |
+---------+-----------+----------------------+
| 3 | 16 | 2019-02-01T00:02:32Z |
+---------+-----------+----------------------+
| 1 | 17 | 2019-02-06T11:30:28Z |
+---------+-----------+----------------------+
| 2 | 18 | 2019-02-06T11:30:28Z |
+---------+-----------+----------------------+
| 3 | 19 | 2019-02-07T09:09:16Z |
+---------+-----------+----------------------+
| 1 | 20 | 2019-02-11T15:37:24Z |
+---------+-----------+----------------------+
| 2 | 21 | 2019-02-18T10:02:07Z |
+---------+-----------+----------------------+
| 3 | 22 | 2019-02-26T12:01:31Z |
+---------+-----------+----------------------+
答案 0 :(得分:2)
您可以这样操作(下一次,请提供数据,以便于帮助您;输入数据要比解决方案花了我更长的时间):
df = pd.DataFrame({'User_ID': [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3],
'Action_ID': [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22],
'Action_At': ['2019-01-31T23:00:37Z', '2019-01-31T23:11:12Z', '2019-01-31T23:14:53Z', '2019-02-01T00:00:30Z', '2019-02-01T00:01:03Z', '2019-02-01T00:02:32Z', '2019-02-06T11:30:28Z', '2019-02-06T11:30:28Z', '2019-02-07T09:09:16Z', '2019-02-11T15:37:24Z', '2019-02-18T10:02:07Z', '2019-02-26T12:01:31Z']})
df.Action_At = pd.to_datetime(df.Action_At)
df.groupby('User_ID').apply(lambda x: (x.Action_At - x.Action_At.shift()).mean())
## User_ID
## 1 3 days 13:32:15.666666
## 2 5 days 19:36:58.333333
## 3 8 days 12:15:32.666666
## dtype: timedelta64[ns]
或者,如果您希望在几天内解决问题:
df.groupby('User_ID').apply(lambda x: (x.Action_At - x.Action_At.shift()).dt.days.mean())
## User_ID
## 1 3.333333
## 2 5.333333
## 3 8.333333
## dtype: float64