A B C D
0 2002-01-13 15:00:00 Joseph 3.9
1 2002-01-13 15:00:00 Emma 1.9
2 2002-01-13 16:00:00 Joseph 8.0
3 2002-01-13 16:00:00 Emma 9.0
4 2002-01-13 17:00:00 Joseph 6.2
5 2002-01-13 17:00:00 Emma 4.5
6 2002-06-23 15:00:00 David 0.2
7 2002-06-23 15:00:00 Rachel 6.7
8 2002-06-23 16:00:00 David 6.6
9 2002-06-23 16:00:00 Rachel 3.1
10 2002-06-23 17:00:00 David 1.0
11 2002-06-23 17:00:00 Rachel 3.2
我按时间和日期分配df
组。
当 B = 15:00:00 时,我想创建一个新列,将 D值修复为 C名称。
它应该是:
A B C D E
0 2002-01-13 15:00:00 Joseph 3.9 3.9 # fix E value
1 2002-01-13 15:00:00 Emma 1.9 1.9 # fix E value
2 2002-01-13 16:00:00 Joseph 8.0 3.9 # Joseph 3.9 for A column
3 2002-01-13 16:00:00 Emma 9.0 1.9 # Emma 1.9 for A column
4 2002-01-13 17:00:00 Joseph 6.2 3.9
5 2002-01-13 17:00:00 Emma 4.5 1.9
6 2002-06-23 15:00:00 David 0.2 0.2 # fix E value
7 2002-06-23 15:00:00 Rachel 6.7 6.7 # fix E value
8 2002-06-23 16:00:00 David 6.6 0.2
9 2002-06-23 16:00:00 Rachel 3.1 6.7
10 2002-06-23 17:00:00 David 1.0 0.2
11 2002-06-23 17:00:00 Rachel 3.2 6.7
答案 0 :(得分:3)
在groupby
列的屏蔽版本上执行C
(ffill
)+ D
:
df['E'] = df.D.mask(df.B.ne('15:00:00')).groupby(df.C).ffill()
如果15:00:00
不是 C
每个组中的第一个时间戳,则在 -
bfill
来电
df['E'] = df.D.mask(df.B.ne('15:00:00')).groupby(df.C).ffill().bfill()
df
A B C D E
0 2002-01-13 15:00:00 Joseph 3.9 3.9
1 2002-01-13 15:00:00 Emma 1.9 1.9
2 2002-01-13 16:00:00 Joseph 8.0 3.9
3 2002-01-13 16:00:00 Emma 9.0 1.9
4 2002-01-13 17:00:00 Joseph 6.2 3.9
5 2002-01-13 17:00:00 Emma 4.5 1.9
6 2002-06-23 15:00:00 David 0.2 0.2
7 2002-06-23 15:00:00 Rachel 6.7 6.7
8 2002-06-23 16:00:00 David 6.6 0.2
9 2002-06-23 16:00:00 Rachel 3.1 6.7
10 2002-06-23 17:00:00 David 1.0 0.2
11 2002-06-23 17:00:00 Rachel 3.2 6.7