修复多列条件的值

时间:2018-03-14 18:28:04

标签: python pandas

            A         B       C    D
0  2002-01-13  15:00:00  Joseph  3.9
1  2002-01-13  15:00:00    Emma  1.9
2  2002-01-13  16:00:00  Joseph  8.0
3  2002-01-13  16:00:00    Emma  9.0
4  2002-01-14  15:00:00  Joseph  0.2
5  2002-01-14  15:00:00    Emma  7.0
6  2002-01-14  16:00:00  Joseph  1.6
7  2002-01-14  16:00:00    Emma  3.4

我希望获得一个新的df["E"]列,用于修复" Joseph"和#34;艾玛" D值在15:00:00,每天剩下的时间。

输出应为:

            A         B       C    D     E
0  2002-01-13  15:00:00  Joseph  3.9   3.9
1  2002-01-13  15:00:00    Emma  1.9   1.9
2  2002-01-13  16:00:00  Joseph  8.0   3.9
3  2002-01-13  16:00:00    Emma  9.0   3.9
4  2002-01-14  15:00:00  Joseph  0.2   0.2
5  2002-01-14  15:00:00    Emma  7.0   7.0
6  2002-01-14  16:00:00  Joseph  1.6   0.2
7  2002-01-14  16:00:00    Emma  3.4   7.0

1 个答案:

答案 0 :(得分:1)

据推测,您希望在groupbyAC,然后在transform上使用first + D

df['E'] = df.groupby(['A', 'C']).D.transform('first')
df

            A         B       C    D    E
0  2002-01-13  15:00:00  Joseph  3.9  3.9
1  2002-01-13  15:00:00    Emma  1.9  1.9
2  2002-01-13  16:00:00  Joseph  8.0  3.9
3  2002-01-13  16:00:00    Emma  9.0  1.9
4  2002-01-14  15:00:00  Joseph  0.2  0.2
5  2002-01-14  15:00:00    Emma  7.0  7.0
6  2002-01-14  16:00:00  Joseph  1.6  0.2
7  2002-01-14  16:00:00    Emma  3.4  7.0

如果条目的开始时间早于15:00:00,请先屏蔽D然后transform

df['E'] = df.assign(
        D=df.D.mask(df.B.ne('15:00:00'))
 ).groupby(['A', 'C']).D.transform('first')

df
            A         B       C    D    E
0  2002-01-13  15:00:00  Joseph  3.9  3.9
1  2002-01-13  15:00:00    Emma  1.9  1.9
2  2002-01-13  16:00:00  Joseph  8.0  3.9
3  2002-01-13  16:00:00    Emma  9.0  1.9
4  2002-01-14  15:00:00  Joseph  0.2  0.2
5  2002-01-14  15:00:00    Emma  7.0  7.0
6  2002-01-14  16:00:00  Joseph  1.6  0.2
7  2002-01-14  16:00:00    Emma  3.4  7.0