我有一个如下所示的数据框
Doctor Appointment Booking_ID No_Show
A 2020-01-18 12:00:00 1 0.25
A 2020-01-18 12:30:00 2 0.28
A 2020-01-18 13:00:00 3 0.35
A 2020-01-18 13:00:00 4 0.75
A 2020-01-18 14:00:00 5 0.65
A 2020-01-18 14:00:00 6 0.35
A 2020-01-18 15:00:00 7 0.25
A 2020-01-19 12:00:00 1 0.25
A 2020-01-19 12:00:00 2 0.95
A 2020-01-19 13:00:00 3 0.35
A 2020-01-19 13:00:00 4 0.75
A 2020-01-19 14:00:00 5 0.65
A 2020-01-19 14:00:00 6 0.85
A 2020-01-19 14:00:00 7 0.35
从上面我想在数据框下面准备
预期输出:
Doctor Appointment Booking_ID No_Show Slot_Number Patient_count
A 2020-01-18 12:00:00 1 0.25 1 1
A 2020-01-18 12:30:00 2 0.28 2 2
A 2020-01-18 13:00:00 3 0.35 3 3
A 2020-01-18 13:00:00 4 0.75 3 4
A 2020-01-18 14:00:00 5 0.65 4 5
A 2020-01-18 14:00:00 6 0.35 4 6
A 2020-01-18 15:00:00 7 0.25 5 7
A 2020-01-19 12:00:00 1 0.25 1 1
A 2020-01-19 12:00:00 2 0.95 1 2
A 2020-01-19 13:00:00 3 0.35 2 3
A 2020-01-19 13:00:00 4 0.75 2 4
A 2020-01-19 14:00:00 5 0.65 3 5
A 2020-01-19 14:00:00 6 0.85 3 6
A 2020-01-19 14:00:00 7 0.35 3 7
说明:
Slot_number = 3 means, 3rd slot of that day for that doctor.
Patient_count = 3 means he is the 3rd patient of that day and same doctor.
答案 0 :(得分:1)
在lambda函数中将GroupBy.transform
与factorize
一起用于枚举变量和用于计数器的GroupBy.cumcount
:
df['Appointment'] = pd.to_datetime(df['Appointment'])
g = df.groupby(['Doctor',df['Appointment'].dt.date])
df['Slot_Number'] = g['Appointment'].transform(lambda x: pd.factorize(x)[0]) + 1
df['Patient_count'] = g.cumcount() + 1
print (df)
Doctor Appointment Booking_ID No_Show Slot_Number Patient_count
0 A 2020-01-18 12:00:00 1 0.25 1 1
1 A 2020-01-18 12:30:00 2 0.28 2 2
2 A 2020-01-18 13:00:00 3 0.35 3 3
3 A 2020-01-18 13:00:00 4 0.75 3 4
4 A 2020-01-18 14:00:00 5 0.65 4 5
5 A 2020-01-18 14:00:00 6 0.35 4 6
6 A 2020-01-18 15:00:00 7 0.25 5 7
7 A 2020-01-19 12:00:00 1 0.25 1 1
8 A 2020-01-19 12:00:00 2 0.95 1 2
9 A 2020-01-19 13:00:00 3 0.35 2 3
10 A 2020-01-19 13:00:00 4 0.75 2 4
11 A 2020-01-19 14:00:00 5 0.65 3 5
12 A 2020-01-19 14:00:00 6 0.85 3 6
13 A 2020-01-19 14:00:00 7 0.35 3 7