按照每个ID在熊猫中连续多少天的计数进行分组

时间:2019-03-21 10:35:30

标签: python-3.x pandas time

如果我的熊猫df如下所示:

df = pd.DataFrame({ "id":[1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,3,3,4,4],
     "date":("2000-07-06","2000-07-07","2000-07-08","2000-07-09","2000-07-10","2000-07-06","2000-07-10","2000-07-11","2000-07-17","2000-07-20","2000-07-06","2000-07-07","2000-07-08","2000-07-10","2000-07-15","2000-07-16","2000-07-25","2000-07-11","2000-07-20")})
df.date =pd.to_datetime(df.date)

         date     id
1   2000-07-06     1
2   2000-07-07     1
3   2000-07-08     1
4   2000-07-09     1
5   2000-07-10     1
6   2000-07-06     2
7   2000-07-10     2
8   2000-07-11     2
9   2000-07-17     2
10  2000-07-20     2
11  2000-07-06     3
12  2000-07-07     3
13  2000-07-08     3
14  2000-07-10     3
15  2000-07-15     3
16  2000-07-16     3
17  2000-07-25     3
18  2000-07-11     4
19  2000-07-20     4

,我想按ID分组,但要计算每个ID连续几天的次数,以期得出这样的结果:

   count     id
1    4        1
2    1        2
3    3        3
3    0        4

我不知道构建循环是否是最佳选择,但是我想知道是否有人知道快速方法或可以做到这一点的功能。谢谢

1 个答案:

答案 0 :(得分:3)

您可以执行pandas.core.groupby.DataFrameGroupBy.diff并有条件地用np.where填充10。之后,总计1 day

1's

输出

df['diff'] = np.where(df.groupby('id')['date'].diff() == '1 days', 1, 0)

df_grouped = df.groupby('id').diff.sum()

或者您可以使用print(df_grouped.reset_index().rename({'diff':'count'}, axis=1)) id count 0 1 4 1 2 1 2 3 3 3 4 0

.agg