我有一个数据框,如下所示。
Doctor Appointment B_ID No_Show
A 2020-01-18 12:00:00 1 0.2
A 2020-01-18 12:30:00 2 0.3
A 2020-01-18 13:00:00 3 0.8
A 2020-01-18 13:00:00 4 0.3
A 2020-01-18 13:30:00 5 0.6
A 2020-01-18 14:00:00 6 0.8
A 2020-01-18 14:00:00 7 0.9
A 2020-01-18 14:00:00 8 0.4
A 2020-01-18 14:00:00 9 0.6
A 2020-01-19 12:00:00 12 0.9
A 2020-01-19 12:00:00 13 0.5
A 2020-01-19 13:00:00 14 0.3
A 2020-01-19 13:00:00 15 0.7
A 2020-01-19 14:00:00 16 0.6
A 2020-01-19 14:00:00 17 0.8
A 2020-01-19 14:00:00 19 0.3
从上面我想在df以下做准备。
No_Show =不出现的可能性。
从上面我想在数据框下面准备
预期输出:
Doctor Appointment B_ID No_Show Session slot_num Patient_count
A 2020-01-18 12:00:00 1 0.2 S1 1 1
A 2020-01-18 12:30:00 2 0.3 S1 2 1
A 2020-01-18 13:00:00 3 0.8 S1 3 1
A 2020-01-18 13:00:00 4 0.3 S1 3 2
A 2020-01-18 13:30:00 5 0.6 S1 4 1
A 2020-01-18 14:00:00 6 0.8 S1 5 1
A 2020-01-18 14:00:00 7 0.9 S1 5 2
A 2020-01-18 14:00:00 8 0.4 S1 5 3
A 2020-01-18 14:00:00 9 0.6 S1 5 4
A 2020-01-19 12:00:00 12 0.9 S2 1 1
A 2020-01-19 12:00:00 13 0.5 S2 1 2
A 2020-01-19 12:30:00 14 0.3 S2 2 1
A 2020-01-19 13:00:00 15 0.7 S2 3 1
A 2020-01-19 13:30:00 15 0.7 S2 4 1
A 2020-01-19 14:00:00 16 0.6 S2 5 1
A 2020-01-19 14:00:00 17 0.8 S2 5 2
A 2020-01-19 14:00:00 19 0.3 S2 5 3
说明:
会议=每天考虑一次会议。
slot_num =当天的广告位(假定每个广告位持续30分钟)。
Patient_count =在相同会话和相同时段上的患者人数。
答案 0 :(得分:3)
对于Series
与Series.factorize
一起使用,加上S
的前缀,并转换为Series
和字符串,在GroupBy.transform
中的自定义函数中使用相似的想法,对于{ {3}} id添加了新列slot_num
:
df['Appointment'] = pd.to_datetime(df['Appointment'])
dates = df['Appointment'].dt.date
df['Session'] = 'S' + pd.Series(dates.factorize()[0] + 1, index=df.index).astype(str)
f = lambda x: pd.factorize(x)[0]
df['slot_num'] = df.groupby(['Doctor', 'Session'])['Appointment'].transform(f) + 1
df['Patient_count'] = df.groupby(['Doctor', 'Session', 'slot_num']).cumcount() + 1
print (df)
Doctor Appointment B_ID No_Show Session slot_num Patient_count
0 A 2020-01-18 12:00:00 1 0.2 S1 1 1
1 A 2020-01-18 12:30:00 2 0.3 S1 2 1
2 A 2020-01-18 13:00:00 3 0.8 S1 3 1
3 A 2020-01-18 13:00:00 4 0.3 S1 3 2
4 A 2020-01-18 13:30:00 5 0.6 S1 4 1
5 A 2020-01-18 14:00:00 6 0.8 S1 5 1
6 A 2020-01-18 14:00:00 7 0.9 S1 5 2
7 A 2020-01-18 14:00:00 8 0.4 S1 5 3
8 A 2020-01-18 14:00:00 9 0.6 S1 5 4
9 A 2020-01-19 12:00:00 12 0.9 S2 1 1
10 A 2020-01-19 12:30:00 13 0.5 S2 2 1
11 A 2020-01-19 13:00:00 14 0.3 S2 3 1
12 A 2020-01-19 13:30:00 15 0.7 S2 4 1
13 A 2020-01-19 14:00:00 16 0.6 S2 5 1
14 A 2020-01-19 14:00:00 17 0.8 S2 5 2
15 A 2020-01-19 14:00:00 19 0.3 S2 5 3