我有一个如下所示的数据框
Doctor Start B_ID Session Finish
A 2020-01-18 12:00:00 1 S1 2020-01-18 12:33:00
A 2020-01-18 12:30:00 2 S1 2020-01-18 12:52:00
A 2020-01-18 13:00:00 3 S1 2020-01-18 13:23:00
A 2020-01-18 13:00:00 4 S1 2020-01-18 13:37:00
A 2020-01-18 13:30:00 5 S1 2020-01-18 13:56:00
A 2020-01-18 14:00:00 6 S1 2020-01-18 14:15:00
A 2020-01-18 14:00:00 7 S1 2020-01-18 14:28:00
A 2020-01-18 14:00:00 8 S1 2020-01-18 14:40:00
A 2020-01-18 14:00:00 9 S1 2020-01-18 15:01:00
A 2020-01-19 12:00:00 12 S2 2020-01-19 12:20:00
A 2020-01-19 12:30:00 13 S2 2020-01-19 12:40:00
A 2020-01-19 13:00:00 14 S2 2020-01-19 13:20:00
A 2020-01-19 13:30:00 15 S2 2020-01-19 13:40:00
A 2020-01-19 14:00:00 16 S2 2020-01-19 14:10:00
A 2020-01-19 14:00:00 17 S2 2020-01-19 14:20:00
A 2020-01-19 14:00:00 19 S2 2020-01-19 14:40:00
根据上述数据框,我想基于同一会话的先前完成时间和当前开始时间创建一个名为“ Actual_start_time”的列。
步骤
if previous finish > current start:
df['Actual_start'] = df['previous finish']
else:
df['Actual_start'] = df['Start']
预期输出:
Doctor Start B_ID Session Finish Actual_start
A 2020-01-18 12:00:00 1 S1 2020-01-18 12:33:00 2020-01-18 12:00:00
A 2020-01-18 12:30:00 2 S1 2020-01-18 12:52:00 2020-01-18 12:33:00
A 2020-01-18 13:00:00 3 S1 2020-01-18 13:23:00 2020-01-18 13:00:00
A 2020-01-18 13:00:00 4 S1 2020-01-18 13:37:00 2020-01-18 13:23:00
A 2020-01-18 13:30:00 5 S1 2020-01-18 13:56:00 2020-01-18 13:37:00
A 2020-01-18 14:00:00 6 S1 2020-01-18 14:15:00 2020-01-18 14:00:00
A 2020-01-18 14:00:00 7 S1 2020-01-18 14:28:00 2020-01-18 14:15:00
A 2020-01-18 14:00:00 8 S1 2020-01-18 14:40:00 2020-01-18 14:28:00
A 2020-01-18 14:00:00 9 S1 2020-01-18 15:01:00 2020-01-18 14:40:00
A 2020-01-19 12:00:00 12 S2 2020-01-19 12:20:00 2020-01-19 12:00:00
A 2020-01-19 12:30:00 13 S2 2020-01-19 12:40:00 2020-01-19 12:30:00
A 2020-01-19 13:00:00 14 S2 2020-01-19 13:20:00 2020-01-19 13:00:00
A 2020-01-19 13:30:00 15 S2 2020-01-19 13:40:00 2020-01-19 13:30:00
A 2020-01-19 14:00:00 16 S2 2020-01-19 14:10:00 2020-01-19 14:00:00
A 2020-01-19 14:00:00 17 S2 2020-01-19 14:20:00 2020-01-19 14:10:00
A 2020-01-19 14:00:00 19 S2 2020-01-19 14:40:00 2020-01-19 14:20:00
答案 0 :(得分:1)
使用DataFrameGroupBy.shift
,并将Start
列中group的第一个值替换为Series.fillna
:
df['Start'] = pd.to_datetime(df['Start'])
df['Finish'] = pd.to_datetime(df['Finish'])
s = df.groupby('Session')['Finish'].shift()
df['Actual_start'] = np.where(s.gt(df['Start']), s, df['Start'])
print (df)
Doctor Start B_ID Session Finish \
0 A 2020-01-18 12:00:00 1 S1 2020-01-18 12:33:00
1 A 2020-01-18 12:30:00 2 S1 2020-01-18 12:52:00
2 A 2020-01-18 13:00:00 3 S1 2020-01-18 13:23:00
3 A 2020-01-18 13:00:00 4 S1 2020-01-18 13:37:00
4 A 2020-01-18 13:30:00 5 S1 2020-01-18 13:56:00
5 A 2020-01-18 14:00:00 6 S1 2020-01-18 14:15:00
6 A 2020-01-18 14:00:00 7 S1 2020-01-18 14:28:00
7 A 2020-01-18 14:00:00 8 S1 2020-01-18 14:40:00
8 A 2020-01-18 14:00:00 9 S1 2020-01-18 15:01:00
9 A 2020-01-19 12:00:00 12 S2 2020-01-19 12:20:00
10 A 2020-01-19 12:30:00 13 S2 2020-01-19 12:40:00
11 A 2020-01-19 13:00:00 14 S2 2020-01-19 13:20:00
12 A 2020-01-19 13:30:00 15 S2 2020-01-19 13:40:00
13 A 2020-01-19 14:00:00 16 S2 2020-01-19 14:10:00
14 A 2020-01-19 14:00:00 17 S2 2020-01-19 14:20:00
15 A 2020-01-19 14:00:00 19 S2 2020-01-19 14:40:00
Actual_start
0 2020-01-18 12:00:00
1 2020-01-18 12:33:00
2 2020-01-18 13:00:00
3 2020-01-18 13:23:00
4 2020-01-18 13:37:00
5 2020-01-18 14:00:00
6 2020-01-18 14:15:00
7 2020-01-18 14:28:00
8 2020-01-18 14:40:00
9 2020-01-19 12:00:00
10 2020-01-19 12:30:00
11 2020-01-19 13:00:00
12 2020-01-19 13:30:00
13 2020-01-19 14:00:00
14 2020-01-19 14:10:00
15 2020-01-19 14:20:00