似乎累积总和既是一个常见的问题,也是一个即使在阅读其他帖子之后也难以理解的问题。
这是我的情况: 我有这些数据:
User | Timestamp | Period | Count
User1 | 2006-08-13 | Morning | 1
User1 | 2006-08-14 | Evening | 1
User1 | 2006-08-17 | Morning | 1
User1 | 2006-09-15 | Evening | 1
User2 | 2006-09-16 | Morning | 1
User2 | 2006-09-17 | Morning | 1
我想要相同的表,但每个用户+期间组合的累积计数。像这样:
User | Timestamp | Period | Count | CCount
User1 | 2006-08-13 | Morning | 1 | 1
User1 | 2006-08-14 | Evening | 1 | 1
User1 | 2006-08-17 | Morning | 1 | 2
User1 | 2006-09-15 | Evening | 1 | 2
User2 | 2006-09-16 | Morning | 1 | 1
User2 | 2006-09-17 | Evening | 1 | 2
答案 0 :(得分:1)
您可以在groupby对象上使用cumcount:
df['CCount'] = df.groupby(['User', 'Period']).cumcount() + 1
df
Out:
User Timestamp Period Count CCount
0 User1 2006-08-13 Morning 1 1
1 User1 2006-08-14 Evening 1 1
2 User1 2006-08-17 Morning 1 2
3 User1 2006-09-15 Evening 1 2
4 User2 2006-09-16 Morning 1 1
5 User2 2006-09-17 Morning 1 2