我有以下数据框,其中包含4列。让我们称之为df。
ID Start transfer Finish transfer Ward
0 7685933 04/11/2015 12:07 05/11/2015 12:49 General surgery
1 7685933 05/11/2015 12:49 11/11/2015 14:42 Anestesiology
2 7685933 11/11/2015 14:42 11/11/2015 16:12 Anestesiology
3 7685933 11/11/2015 16:12 18/11/2015 21:24 General surgery
4 7685933 18/11/2015 21:24 02/01/2016 06:45 ICU
5 7690142 06/11/2015 17:24 30/11/2015 18:11 Internal Medicine
6 7690142 30/11/2015 18:11 02/12/2015 17:04 Internal Medicine
7 7690142 02/12/2015 17:04 03/12/2015 20:40 Internal Medicine
8 7690142 03/12/2015 20:40 11/01/2016 18:00 Internal Medicine
9 7691888 08/11/2015 16:28 16/11/2015 17:11 Internal Medicine
10 7691888 16/11/2015 17:11 20/11/2015 18:13 Internal Medicine
11 7691888 20/11/2015 18:13 04/01/2016 18:02 Internal Medicine
12 7691888 04/01/2016 18:02 04/01/2016 21:13 Internal Medicine
现在我想根据列'ID'对数据进行分组,然后查找类似的连续Wards,其中Ward的'Finish Transfer'与下一个连续的类似Ward名称的'Start Transfer'相同。一旦确定了这一点,我需要从最后一个连续病房行复制完成转移条目,并用该值更新该特定病房的第一个条目。例如,索引1和2处的row1和row2都具有相似的区域,如果查看row1(index1)的Finish Transfer条目,则类似于row2的start Transfer(index2)。沃德也一样。我想要的是只有一行连续数据,其中start transfer是row1中的一个,Finish是来自row2。
我希望以下输出(可能在新数据框中):
ID Start transfer Finish transfer Ward
0 7685933 04/11/2015 12:07 05/11/2015 12:49 General surgery
1 7685933 05/11/2015 12:49 11/11/2015 16:12 Anestesiology
2 7685933 11/11/2015 16:12 18/11/2015 21:24 General surgery
3 7685933 18/11/2015 21:24 02/01/2016 06:45 ICU
4 7690142 06/11/2015 17:24 11/01/2016 18:00 Internal Medicine
5 7691888 08/11/2015 16:28 04/01/2016 21:13 Internal Medicine
提前感谢您的帮助。
答案 0 :(得分:1)
IIUC
df.groupby(['ID','Ward']).agg({'Start transfer':'first','Finish transfer':'last'}).reset_index()
Out[151]:
ID Ward Start transfer Finish transfer
0 7685933 Anestesiology 05/11/2015 12:49 11/11/2015 16:12
1 7685933 General surgery 04/11/2015 12:07 18/11/2015 21:24
2 7685933 ICU 18/11/2015 21:24 02/01/2016 06:45
3 7690142 Internal Medicine 06/11/2015 17:24 11/01/2016 18:00
4 7691888 Internal Medicine 08/11/2015 16:28 04/01/2016 21:13
更新
df.assign(Key=(df.Ward.shift()!=df.Ward).cumsum()).groupby(['ID','Ward','Key']).agg({'Start transfer':'first','Finish transfer':'last'}).reset_index().sort_values('Key')
Out[181]:
ID Ward Key Start transfer Finish transfer
1 7685933 General surgery 1 04/11/2015 12:07 05/11/2015 12:49
0 7685933 Anestesiology 2 05/11/2015 12:49 11/11/2015 16:12
2 7685933 General surgery 3 11/11/2015 16:12 18/11/2015 21:24
3 7685933 ICU 4 18/11/2015 21:24 02/01/2016 06:45
4 7690142 Internal Medicine 5 06/11/2015 17:24 11/01/2016 18:00
5 7691888 Internal Medicine 5 08/11/2015 16:28 04/01/2016 21:13