如果我的熊猫df如下所示:
df = pd.DataFrame({ "id":[1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,3,3,4,4],
"date":("2000-07-06","2000-07-07","2000-07-08","2000-07-09","2000-07-10","2000-07-06","2000-07-10","2000-07-11","2000-07-17","2000-07-20","2000-07-06","2000-07-07","2000-07-08","2000-07-10","2000-07-15","2000-07-16","2000-07-25","2000-07-11","2000-07-20")})
df.date =pd.to_datetime(df.date)
date id
1 2000-07-06 1
2 2000-07-07 1
3 2000-07-08 1
4 2000-07-09 1
5 2000-07-10 1
6 2000-07-06 2
7 2000-07-10 2
8 2000-07-11 2
9 2000-07-17 2
10 2000-07-20 2
11 2000-07-06 3
12 2000-07-07 3
13 2000-07-08 3
14 2000-07-10 3
15 2000-07-15 3
16 2000-07-16 3
17 2000-07-25 3
18 2000-07-11 4
19 2000-07-20 4
,我想按ID分组,但要计算每个ID连续几天的次数,以期得出这样的结果:
count id
1 4 1
2 1 2
3 3 3
3 0 4
我不知道构建循环是否是最佳选择,但是我想知道是否有人知道快速方法或可以做到这一点的功能。谢谢
答案 0 :(得分:3)
您可以执行pandas.core.groupby.DataFrameGroupBy.diff
并有条件地用np.where
填充1
或0
。之后,总计1 day
:
1's
输出
df['diff'] = np.where(df.groupby('id')['date'].diff() == '1 days', 1, 0)
df_grouped = df.groupby('id').diff.sum()
或者您可以使用print(df_grouped.reset_index().rename({'diff':'count'}, axis=1))
id count
0 1 4
1 2 1
2 3 3
3 4 0
:
.agg