我有一个看起来像这样的数据框:
emp job phase cat hours equipnum equipcode equiphours equipdate
0 OO003 19713 95L 9512 1 None None 0.0 2020-01-24
1 OO003 19713 95L 9512 1 None None 0.0 2020-01-24
2 OO003 19713 95L 9512 1 None None 0.0 2020-01-24
3 OO003 19713 95L 9512 1 None None 0.0 2020-01-24
4 OO003 19526 OH MAT 1 AIR012 E-REV 1.0 2020-01-24
5 OO003 19526 OH MAT 1 AIR012 E-REV 1.0 2020-01-24
6 OO003 19526 OH MAT 1 AIR012 E-REV 1.0 2020-01-24
7 OO003 19486 52L 5212 1 None None 0.0 2020-01-24
8 OO003 19486 52L 5212 1 None None 0.0 2020-01-24
9 OO003 19486 52L 5212 1 None None 0.0 2020-01-24
10 UR003 19713 95L 9512 1 None None 0.0 2020-01-24
11 UR003 19713 95L 9512 1 None None 0.0 2020-01-24
12 UR003 19713 95L 9512 1 None None 0.0 2020-01-24
13 UR003 19526 OH MAT 1 None None 0.0 2020-01-24
14 UR003 19526 OH MAT 1 None None 0.0 2020-01-24
15 UR003 19526 OH MAT 1 None None 0.0 2020-01-24
16 UR003 19526 OH MAT 1 None None 0.0 2020-01-24
17 UR003 19526 OH MAT 1 None None 0.0 2020-01-24
18 UR003 19526 OH MAT 1 None None 0.0 2020-01-24
19 UR003 19526 OH MAT 1 None None 0.0 2020-01-24
是否有一种方法可以对前8行的小时数列进行汇总,然后对每个唯一的雇员编号(emp)进行后2行分组?
最终数据框应如下所示:
emp job phase cat hours equipnum equipcode equiphours equipdate
0 OO003 19713 95L 9512 4 None None 0.0 2020-01-24
1 OO003 19526 OH MAT 3 AIR012 E-REV 1.0 2020-01-24
2 OO003 19486 52L 5212 1 None None 0.0 2020-01-24
3 OO003 19486 52L 5212 2 None None 0.0 2020-01-24
4 UR003 19713 95L 9512 3 None None 0.0 2020-01-24
5 UR003 19526 OH MAT 5 None None 0.0 2020-01-24
6 UR003 19526 OH MAT 2 None None 0.0 2020-01-24
谢谢您的帮助!
答案 0 :(得分:0)
您需要2个groupby
。第一个创建员工内部累计工作时间。然后,按员工,工作以及累计工作小时数是否为<= 8
分组。相应地汇总列。
s = df.groupby('emp').hours.cumsum()
#s = df.groupby('emp').cumcount()+1 # If truly rows, not hours
# `first` for everything but hours and group keys. `sum` for hours
agg_d = {x: 'first' for x in df.columns.difference(['hours', 'job', 'emp'])}
agg_d['hours'] = 'sum'
res = (df.groupby(['job', 'emp', s.le(8).rename('drop')], sort=False)
.agg(agg_d)
.reset_index()
.drop(columns='drop'))
print(res)
job emp cat equipcode equipdate equiphours equipnum phase hours
0 19713 OO003 9512 None 2020-01-24 0.0 None 95L 4
1 19526 OO003 MAT E-REV 2020-01-24 1.0 AIR012 OH 3
2 19486 OO003 5212 None 2020-01-24 0.0 None 52L 1
3 19486 OO003 5212 None 2020-01-24 0.0 None 52L 2
4 19713 UR003 9512 None 2020-01-24 0.0 None 95L 3
5 19526 UR003 MAT None 2020-01-24 0.0 None OH 5
6 19526 UR003 MAT None 2020-01-24 0.0 None OH 2