根据日期时间列(按熊猫级别)分组

时间:2020-04-07 05:14:42

标签: pandas pandas-groupby

我有一个数据框,如下所示。

Doctor       Appointment           Booking_ID   
  A          2020-01-18 12:00:00     1 
  A          2020-01-18 12:30:00     2
  A          2020-01-18 13:00:00     3 
  A          2020-01-18 13:00:00     4 
  B          2020-01-18 12:00:00     5 
  B          2020-01-18 12:30:00     6 
  B          2020-01-18 13:00:00     7
  B          2020-01-18 13:00:00     8 
  B          2020-01-18 13:00:00     9 
  B          2020-01-18 16:30:00     10 
  A          2020-01-19 12:00:00     11 
  A          2020-01-19 12:30:00     12 
  A          2020-01-19 13:00:00     13
  A          2020-01-19 13:30:00     14
  A          2020-01-19 14:00:00     15 
  A          2020-01-19 14:00:00     16 
  A          2020-01-19 14:00:00     17 
  A          2020-01-19 14:00:00     18 
  B          2020-01-19 12:00:00     19 
  B          2020-01-19 12:30:00     20
  B          2020-01-19 13:00:00     21
  B          2020-01-19 13:30:00     22 
  B          2020-01-19 14:00:00     23
  B          2020-01-19 13:30:00     24 
  B          2020-01-19 15:00:00     25 
  B          2020-01-18 15:30:00     26

从上面我想知道同一位医生同一时间的预约数。

预期输出:

    Doctor           Appointment     Booking_ID   Number_of_Booking
      A          2020-01-18 12:00:00     1         1
      A          2020-01-18 12:30:00     2         1
      A          2020-01-18 13:00:00     3         2
      A          2020-01-18 13:00:00     4         2
      B          2020-01-18 12:00:00     5         1
      B          2020-01-18 12:30:00     6         1
      B          2020-01-18 13:00:00     7         3
      B          2020-01-18 13:00:00     8         3
      B          2020-01-18 13:00:00     9         3
      B          2020-01-18 16:30:00     10        1
      A          2020-01-19 12:00:00     11        1
      A          2020-01-19 12:30:00     12        1
      A          2020-01-19 13:00:00     13        1
      A          2020-01-19 13:30:00     14        1
      A          2020-01-19 14:00:00     15        4
      A          2020-01-19 14:00:00     16        4
      A          2020-01-19 14:00:00     17        4
      A          2020-01-19 14:00:00     18        4
      B          2020-01-19 12:00:00     19        1
      B          2020-01-19 12:30:00     20        1 
      B          2020-01-19 13:00:00     21        1
      B          2020-01-19 13:30:00     22        2
      B          2020-01-19 14:00:00     23        2
      B          2020-01-19 13:30:00     24        2 
      B          2020-01-19 14:00:00     25        2
      B          2020-01-18 15:30:00     26        1

示例:

在时间2020-01-19 13:30:00 B医生有两次预订,如下所示

Doctor       Appointment           Booking_ID
B          2020-01-19 13:30:00     22
B          2020-01-19 13:30:00     24 

所以输出将如下所示

 Doctor       Appointment           Booking_ID     Number_of_Booking
    B        2020-01-19 13:30:00     22             2
    B        2020-01-19 13:30:00     24             2

1 个答案:

答案 0 :(得分:2)

首次将GroupBy.transformGroupBy.size一起使用:

df['Number_of_Booking']=df.groupby(['Doctor','Appointment'])['Booking_ID'].transform('size')

print (df.head())
  Doctor          Appointment  Booking_ID  Number_of_Booking
0      A  2020-01-18 12:00:00           1                  1
1      A  2020-01-18 12:30:00           2                  1
2      A  2020-01-18 13:00:00           3                  2
3      A  2020-01-18 13:00:00           4                  2
4      B  2020-01-18 12:00:00           5                  1

对于所有数据中DoctorAppointment的唯一组合,例如样本中的第二个,则分配长度DataFrame

df['Number_of_Booking'] = len(df)
print (df)
  Doctor          Appointment  Booking_ID  Number_of_Booking
0      B  2020-01-19 13:30:00          22                  2
1      B  2020-01-19 13:30:00          24                  2