Question

我有一个包含ID和日期的活动列表。我想知道的是过去使用该ID发生的事件数。例如：

import pandas as pd

rng = pd.date_range('1/1/2018', periods=10, freq='D')
df = pd.DataFrame({'id':[1,1,1,2,2,3,3,3,3,3], 'date':rng})

输入数据框：

    date       id
0   2018-01-01  1
1   2018-01-02  1
2   2018-01-03  1
3   2018-01-04  2
4   2018-01-05  2
5   2018-01-06  3
6   2018-01-07  3
7   2018-01-08  3
8   2018-01-09  3
9   2018-01-10  3

期望的输出：

    date       id   occurrences
0   2018-01-01  1   0
1   2018-01-02  1   1
2   2018-01-03  1   2
3   2018-01-04  2   0
4   2018-01-05  2   1
5   2018-01-06  3   0
6   2018-01-07  3   1
7   2018-01-08  3   2
8   2018-01-09  3   3
9   2018-01-10  3   4

这很容易通过循环遍历行，但我想知道是否有更有效的方法来执行它。以下是循环遍历行的解决方案：

occurrences = []
for index, row in df.iterrows():
    occurrences.append(df[(df['id'] == row['id']) & (df['date'] < row['date'])].shape[0])

df['occurrences'] = occurrences

Answer 1

groupby id和cumcount：

df.groupby('id').cumcount()

0    0
1    1
2    2
3    0
4    1
5    0
6    1
7    2
8    3
9    4

注意
影响你的df：

df['occurences'] = df.groupby('id').cumcount()

或（如@Scott所说）
使用assign获得以下单行：

df.assign(occurences = df.groupby('id').cumcount())

结果

print(df) date id occurences 0 2018-01-01 1 0 1 2018-01-02 1 1 2 2018-01-03 1 2 3 2018-01-04 2 0 4 2018-01-05 2 1 5 2018-01-06 3 0 6 2018-01-07 3 1 7 2018-01-08 3 2 8 2018-01-09 3 3 9 2018-01-10 3 4

计算日期之前具有相同ID的出现次数

1 个答案: