基于多种条件的datetime列上的pandas grouby计数

时间:2020-04-07 08:42:44

标签: pandas pandas-groupby

我有一个如下所示的数据框

Doctor       Appointment           Booking_ID   No_Show   
  A          2020-01-18 12:00:00     1          0.25
  A          2020-01-18 12:30:00     2          0.28
  A          2020-01-18 13:00:00     3          0.35
  A          2020-01-18 13:00:00     4          0.75
  A          2020-01-18 14:00:00     5          0.65
  A          2020-01-18 14:00:00     6          0.35
  A          2020-01-18 15:00:00     7          0.25
  A          2020-01-19 12:00:00     1          0.25
  A          2020-01-19 12:00:00     2          0.95
  A          2020-01-19 13:00:00     3          0.35
  A          2020-01-19 13:00:00     4          0.75
  A          2020-01-19 14:00:00     5          0.65
  A          2020-01-19 14:00:00     6          0.85
  A          2020-01-19 14:00:00     7          0.35

从上面我想在数据框下面准备

预期输出:

Doctor       Appointment           Booking_ID   No_Show   Slot_Number     Patient_count
  A          2020-01-18 12:00:00     1          0.25      1               1
  A          2020-01-18 12:30:00     2          0.28      2               2
  A          2020-01-18 13:00:00     3          0.35      3               3
  A          2020-01-18 13:00:00     4          0.75      3               4
  A          2020-01-18 14:00:00     5          0.65      4               5
  A          2020-01-18 14:00:00     6          0.35      4               6
  A          2020-01-18 15:00:00     7          0.25      5               7
  A          2020-01-19 12:00:00     1          0.25      1               1
  A          2020-01-19 12:00:00     2          0.95      1               2
  A          2020-01-19 13:00:00     3          0.35      2               3
  A          2020-01-19 13:00:00     4          0.75      2               4
  A          2020-01-19 14:00:00     5          0.65      3               5
  A          2020-01-19 14:00:00     6          0.85      3               6
  A          2020-01-19 14:00:00     7          0.35      3               7

说明:

Slot_number = 3 means, 3rd slot of that day for that doctor.
Patient_count = 3 means he is the 3rd patient of that day and same doctor.

1 个答案:

答案 0 :(得分:1)

在lambda函数中将GroupBy.transformfactorize一起用于枚举变量和用于计数器的GroupBy.cumcount

df['Appointment'] = pd.to_datetime(df['Appointment'])

g = df.groupby(['Doctor',df['Appointment'].dt.date])

df['Slot_Number'] = g['Appointment'].transform(lambda x: pd.factorize(x)[0]) + 1
df['Patient_count'] = g.cumcount() + 1
print (df)
   Doctor         Appointment  Booking_ID  No_Show  Slot_Number  Patient_count
0       A 2020-01-18 12:00:00           1     0.25            1              1
1       A 2020-01-18 12:30:00           2     0.28            2              2
2       A 2020-01-18 13:00:00           3     0.35            3              3
3       A 2020-01-18 13:00:00           4     0.75            3              4
4       A 2020-01-18 14:00:00           5     0.65            4              5
5       A 2020-01-18 14:00:00           6     0.35            4              6
6       A 2020-01-18 15:00:00           7     0.25            5              7
7       A 2020-01-19 12:00:00           1     0.25            1              1
8       A 2020-01-19 12:00:00           2     0.95            1              2
9       A 2020-01-19 13:00:00           3     0.35            2              3
10      A 2020-01-19 13:00:00           4     0.75            2              4
11      A 2020-01-19 14:00:00           5     0.65            3              5
12      A 2020-01-19 14:00:00           6     0.85            3              6
13      A 2020-01-19 14:00:00           7     0.35            3              7