A B C D
0 2002-01-13 15:00:00 Joseph 3.9
1 2002-01-13 15:00:00 Emma 1.9
2 2002-01-13 16:00:00 Joseph 8.0
3 2002-01-13 16:00:00 Emma 9.0
4 2002-01-14 15:00:00 Joseph 0.2
5 2002-01-14 15:00:00 Emma 7.0
6 2002-01-14 16:00:00 Joseph 1.6
7 2002-01-14 16:00:00 Emma 3.4
我希望获得一个新的df["E"]
列,用于修复" Joseph"和#34;艾玛" D值在15:00:00,每天剩下的时间。
输出应为:
A B C D E
0 2002-01-13 15:00:00 Joseph 3.9 3.9
1 2002-01-13 15:00:00 Emma 1.9 1.9
2 2002-01-13 16:00:00 Joseph 8.0 3.9
3 2002-01-13 16:00:00 Emma 9.0 3.9
4 2002-01-14 15:00:00 Joseph 0.2 0.2
5 2002-01-14 15:00:00 Emma 7.0 7.0
6 2002-01-14 16:00:00 Joseph 1.6 0.2
7 2002-01-14 16:00:00 Emma 3.4 7.0
答案 0 :(得分:1)
据推测,您希望在groupby
和A
上C
,然后在transform
上使用first
+ D
。
df['E'] = df.groupby(['A', 'C']).D.transform('first')
df
A B C D E
0 2002-01-13 15:00:00 Joseph 3.9 3.9
1 2002-01-13 15:00:00 Emma 1.9 1.9
2 2002-01-13 16:00:00 Joseph 8.0 3.9
3 2002-01-13 16:00:00 Emma 9.0 1.9
4 2002-01-14 15:00:00 Joseph 0.2 0.2
5 2002-01-14 15:00:00 Emma 7.0 7.0
6 2002-01-14 16:00:00 Joseph 1.6 0.2
7 2002-01-14 16:00:00 Emma 3.4 7.0
如果条目的开始时间早于15:00:00
,请先屏蔽D
然后transform
:
df['E'] = df.assign(
D=df.D.mask(df.B.ne('15:00:00'))
).groupby(['A', 'C']).D.transform('first')
df
A B C D E
0 2002-01-13 15:00:00 Joseph 3.9 3.9
1 2002-01-13 15:00:00 Emma 1.9 1.9
2 2002-01-13 16:00:00 Joseph 8.0 3.9
3 2002-01-13 16:00:00 Emma 9.0 1.9
4 2002-01-14 15:00:00 Joseph 0.2 0.2
5 2002-01-14 15:00:00 Emma 7.0 7.0
6 2002-01-14 16:00:00 Joseph 1.6 0.2
7 2002-01-14 16:00:00 Emma 3.4 7.0