根据其他列的值进行正向填充

时间:2019-01-11 17:28:43

标签: python python-3.x pandas dataframe data-manipulation

更新: 我有一个大熊猫数据框,其中包含admitTime,chargeTime,pat_name,pat_rec,它有大约500万条记录。我试图根据其余列的dischargeTime datetime值来向前填充放电时间pat_name列,然后在此之后中断。

df:

admitTime dischargeTime pat_name pat_rec
2013-12-23 20:20:30 2013-12-23 21:03:00 Alex A4536
2013-12-23 21:00:30 2013-12-23 21:01:00 2013-12-23 21:01:30 2013-12-23 21:02:00 2013-12-23 21:02:30 2013-12-23 21:03:00 2013-12-23 21:03:30 2013-12-23 21:04:00 2013-12-23 21:04:30 2013-12-23 21:05:00 2013-12-23 21:08:30 Sam A4523 2013-12-23 21:06:00 2013-12-23 21:06:30 2013-12-23 21:07:00 2013-12-23 21:07:30 2013-12-23 21:08:00 2013-12-23 21:08:30 2013-12-23 21:09:00 2013-12-23 21:09:30 2013-12-23 21:13:30 Mike A9873 2013-12-23 21:10:00 2013-12-23 21:10:30 2013-12-23 21:11:00 2013-12-23 21:11:30 2013-12-23 21:12:00 2013-12-23 21:12:30 2013-12-23 21:13:00 2013-12-23 21:13:30 2013-12-23 21:14:00
2013-12-23 21:14:30

理想情况下,我希望我的df看起来像

datetime discchargeTime pat_name pat_rec
2013-12-23 20:20:30 2013-12-23 21:03:00 Alex A4536
2013-12-23 21:00:30 2013-12-23 21:03:00 Alex A4536 2013-12-23 21:01:00 2013-12-23 21:03:00 Alex A4536 2013-12-23 21:01:30 2013-12-23 21:03:00 Alex A4536 2013-12-23 21:02:00 2013-12-23 21:03:00 Alex A4536 2013-12-23 21:02:30 2013-12-23 21:03:00 Alex A4536 2013-12-23 21:03:00 2013-12-23 21:03:00 Alex A4536 2013-12-23 21:03:30
2013-12-23 21:04:00
2013-12-23 21:04:30
2013-12-23 21:05:00 2013-12-23 21:08:30 Sam A4523 2013-12-23 21:05:30 2013-12-23 21:08:30 Sam A4523 2013-12-23 21:06:00 2013-12-23 21:08:30 Sam A4523 2013-12-23 21:06:30 2013-12-23 21:08:30 Sam A4523 2013-12-23 21:07:00 2013-12-23 21:08:30 Sam A4523 2013-12-23 21:07:30 2013-12-23 21:08:30 Sam A4523 2013-12-23 21:08:00 2013-12-23 21:08:30 Sam A4523 2013-12-23 21:08:30 2013-12-23 21:08:30 Sam A4523 2013-12-23 21:09:00
2013-12-23 21:09:30 2013-12-23 21:13:30 Mike A9873 2013-12-23 21:10:00 2013-12-23 21:13:30 Mike A9873 2013-12-23 21:10:30 2013-12-23 21:13:30 Mike A9873 2013-12-23 21:11:00 2013-12-23 21:13:30 Mike A9873 2013-12-23 21:11:30 2013-12-23 21:13:30 Mike A9873 2013-12-23 21:12:00 2013-12-23 21:13:30 Mike A9873 2013-12-23 21:12:30 2013-12-23 21:13:30 Mike A9873 2013-12-23 21:13:00 2013-12-23 21:13:30 Mike A9873 2013-12-23 21:13:30 2013-12-23 21:13:30 Mike A9873 2013-12-23 21:14:00
2013-12-23 21:14:30

我尝试了df[column_name].ffill(),但后来意识到这样做不正确。

如果能得到任何建议,我将不胜感激。

1 个答案:

答案 0 :(得分:0)

您可以转发填充,然后使用布尔过滤器将值还原为NaN

fill_cols = ['dischargeTime', 'pat_name', 'pat_rec']
df[fill_cols] = df[fill_cols].ffill()
df[fill_cols] = df[fill_cols].mask(df['admitTime'] > df['dischargeTime'])