熊猫中的复杂条件聚合

时间:2020-01-26 17:39:13

标签: python-3.x pandas

在此表中,我想找到每个用户平均两次操作之间的平均天数。

我的意思是,我想按user_id分组,然后我想直接从每个日期之前的日期减去每个日期(每个用户的天数)。然后找到每位用户的平均天数(每位用户的No_Action天数的平均值)。

+---------+-----------+----------------------+
| User_ID | Action_ID | Action_At            |
+---------+-----------+----------------------+
| 1       | 11        | 2019-01-31T23:00:37Z |
+---------+-----------+----------------------+
| 2       | 12        | 2019-01-31T23:11:12Z |
+---------+-----------+----------------------+
| 3       | 13        | 2019-01-31T23:14:53Z |
+---------+-----------+----------------------+
| 1       | 14        | 2019-02-01T00:00:30Z |
+---------+-----------+----------------------+
| 2       | 15        | 2019-02-01T00:01:03Z |
+---------+-----------+----------------------+
| 3       | 16        | 2019-02-01T00:02:32Z |
+---------+-----------+----------------------+
| 1       | 17        | 2019-02-06T11:30:28Z |
+---------+-----------+----------------------+
| 2       | 18        | 2019-02-06T11:30:28Z |
+---------+-----------+----------------------+
| 3       | 19        | 2019-02-07T09:09:16Z |
+---------+-----------+----------------------+
| 1       | 20        | 2019-02-11T15:37:24Z |
+---------+-----------+----------------------+
| 2       | 21        | 2019-02-18T10:02:07Z |
+---------+-----------+----------------------+
| 3       | 22        | 2019-02-26T12:01:31Z |
+---------+-----------+----------------------+

1 个答案:

答案 0 :(得分:2)

您可以这样操作(下一次,请提供数据,以便于帮助您;输入数据要比解决方案花了我更长的时间):

df = pd.DataFrame({'User_ID': [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3],
                   'Action_ID': [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22],
                   'Action_At': ['2019-01-31T23:00:37Z', '2019-01-31T23:11:12Z', '2019-01-31T23:14:53Z', '2019-02-01T00:00:30Z', '2019-02-01T00:01:03Z', '2019-02-01T00:02:32Z', '2019-02-06T11:30:28Z', '2019-02-06T11:30:28Z', '2019-02-07T09:09:16Z', '2019-02-11T15:37:24Z', '2019-02-18T10:02:07Z', '2019-02-26T12:01:31Z']})

df.Action_At = pd.to_datetime(df.Action_At)

df.groupby('User_ID').apply(lambda x: (x.Action_At - x.Action_At.shift()).mean())

## User_ID
## 1   3 days 13:32:15.666666
## 2   5 days 19:36:58.333333
## 3   8 days 12:15:32.666666
## dtype: timedelta64[ns]

或者,如果您希望在几天内解决问题:

df.groupby('User_ID').apply(lambda x: (x.Action_At - x.Action_At.shift()).dt.days.mean())

## User_ID
## 1    3.333333
## 2    5.333333
## 3    8.333333
## dtype: float64