Question

我正在尝试从小组（患者）组织的交易列表中构建“剧集”。我曾经用Stata做过这个，但我不确定如何在Python中做到这一点。在Stata中，我会说：

import styled from 'styled-components';
import { NavBar } from 'react-bootstrap';

export const MyNavBar = styled(NavBar)`
    background-color: red;
`;

// somewhere...
<MyNavBar />

在英语中，这意味着从组的第一行开始，并检查该组的by patient: replace startDate = startDate[_n-1] if startDate-endDate[_n-1]<10与前一组的startDate之间的天数是否小于10。然后，移动到下一行并执行相同的操作，然后执行下一行...直到您耗尽所有行。

我一直在试图弄清楚如何在Python / Pandas中做同样的事情并碰壁。我可以按患者和日期对数据帧进行排序，然后遍历整个数据框。似乎应该有更好的方法来做到这一点。

脚本首先将第2行与第1行进行比较是很重要的，因为当我到第3行时，如果脚本替换了第2行中的值，当我到达第3行时，我想使用替换值，不是原来的价值。

示例输入：

endDate

示例输出：

Patient    startDate    endDate  
1          1/1/2016     1/2/2016  
1          1/11/2016    1/12/2016  
1          1/28/2016    1/28/2016  
1          6/15/2016    6/16/2016  
2          3/1/2016     3/1/2016

Answer 1

我认为我们需要shift + groupby，而bfill + mask是关键

df.startDate=pd.to_datetime(df.startDate)
df.endDate=pd.to_datetime(df.endDate)

df.startDate=df.groupby('Patient').apply(lambda x : x.startDate.mask((x.startDate-x.endDate.shift(1)).fillna(0).astype('timedelta64[D]')<10).bfill()).reset_index(level=0,drop=True).fillna(df.startDate)
df
Out[495]: 
   Patient  startDate    endDate
0        1 2016-01-28 2016-01-02
1        1 2016-01-28 2016-01-12
2        1 2016-01-28 2016-01-28
3        1 2016-06-15 2016-06-16
4        2 2016-03-01 2016-03-01

Pandas：在组内的连续行上进行条件替换

1 个答案: