我想根据条件计算员工的工作时间,这是示例数据
df=pd.DataFrame({'ID':[1001,1002,1003,1004,1005,1006],'In Punch':['2019-07-28 08:27:25','30-07-2019 08:10:56','05-08-2019 19:44:12','06-08-2019 08:28:51','25-08-2019 08:03:50','08-08-2019 12:44:12'],'Out Punch':['2019-07-28 08:27:25','30-07-2019 19:48:28','05-08-2019 19:44:12','06-08-2019 19:47:21','25-08-2019 19:40:05','08-08-2019 12:44:12']})
我想要这样的输出
ID In Punch Out Punch Hours
0 1001 2019-07-28 08:27:25 2019-07-28 08:27:25 08:00:00
1 1002 2019-07-30 08:10:56 2019-07-30 19:48:28 11:37:32
2 1003 2019-05-08 19:44:12 2019-05-08 19:44:12 04:00:00
3 1004 2019-06-08 08:28:51 2019-06-08 19:47:21 11:18:30
4 1005 2019-08-25 08:03:50 2019-08-25 19:40:05 11:36:15
5 1006 2019-08-08 12:44:12 2019-08-08 12:44:12 04:00:00
df ['Hours']已创建,条件为
1。。如果df ['Out Punch']-df ['In Punch'] = 00:00:00, 然后检查df ['In Punch'],
如果df ['In Punch']早于或下午12.00 然后
df ['Hours'] = pd.Timedelta(8,unit ='H')(只需插入/更新8.00小时)。
否则,如果
df ['In Punch']在12:00和14.00 pm之间,
然后df ['Hours'] = pd.Timedelta(4,unit ='H')(插入/更新4.00小时)。
其他
df ['Hours'] = pd.Timedelta(4,unit ='H')(插入/更新4.00小时)。
2。。如果df ['Out Punch']-df ['In Punch']!= 00:00:00,
df ['Hours'] = df ['Out Punch']-df ['In Punch']
我尝试过这个
def create(df):
if df['Out Punch'] - df['In Punch'] == pd.Timedelta(0):
if pd.to_timedelta(df['In Punch']) <= pd.Timedelta(12, unit='H'):
return pd.Timedelta(8, unit='H')
elif pd.to_timedelta(t['In Punch']) > pd.Timedelta(12, unit='H') | pd.to_timedelta(t['In Punch']) <= pd.Timedelta(14, unit='H'):
return pd.Timedelta(4, unit='H')
else:
return pd.Timedelta(4, unit='H')
else:
df['Out Punch'] - df['In Punch']
df['Out Punch'] = pd.to_datetime(df['Out Punch']) ; df['In Punch'] = pd.to_datetime(df['In Punch'])
df['Hours'] = df.apply(create, axis=1)
但是它给出了错误
ValueError: ('Value must be Timedelta, string, integer, float, timedelta or convertible', 'occurred at index 0')
有什么建议吗?
答案 0 :(得分:1)
使用numpy.select
:
#convert both columns to datetimes
df[['In Punch', 'Out Punch']] = df[['In Punch', 'Out Punch']].apply(pd.to_datetime)
s = df['Out Punch'] - df['In Punch']
#convert times to timedeltas
td = pd.to_timedelta(df['In Punch'].dt.strftime('%H:%M:%S'))
#compare difference s and timedeltas td
m1 = s == pd.Timedelta(0)
m2 = td <= pd.Timedelta(12, unit='H')
m3 = (td > pd.Timedelta(12, unit='H')) & (td <= pd.Timedelta(14, unit='H'))
m4 = td > pd.Timedelta(15, unit='H')
#output Series
s2 = td + pd.Timedelta(8, unit='H')
s3 = td + pd.Timedelta(4, unit='H')
s4 = td - pd.Timedelta(4, unit='H')
masks =[(m1 & m2), (m1 & m3), (m1 & m4)]
vals = [s2, s3, s4]
#set output by conditions
df['Hours'] = np.select(masks, vals, default=s)
print (df)
ID In Punch Out Punch Hours
0 1001 2019-07-28 08:27:25 2019-07-28 08:27:25 16:27:25
1 1002 2019-07-30 08:10:56 2019-07-30 19:48:28 11:37:32
2 1003 2019-05-08 19:44:12 2019-05-08 19:44:12 15:44:12
3 1004 2019-06-08 08:28:51 2019-06-08 19:47:21 11:18:30
4 1005 2019-08-25 08:03:50 2019-08-25 19:40:05 11:36:15
5 1006 2019-08-08 12:44:12 2019-08-08 12:44:12 16:44:12
编辑:
df[['In Punch', 'Out Punch']] = df[['In Punch', 'Out Punch']].apply(pd.to_datetime)
s = df['Out Punch'] - df['In Punch']
td = pd.to_timedelta(df['In Punch'].dt.strftime('%H:%M:%S'))
m1 = s == pd.Timedelta(0)
m2 = td <= pd.Timedelta(12, unit='H')
m3 = (td > pd.Timedelta(12, unit='H')) & (td <= pd.Timedelta(14, unit='H'))
m4 = td > pd.Timedelta(15, unit='H')
s2 = np.timedelta64(8, 'h')
s3 = np.timedelta64(4, 'h')
masks =[(m1 & m2), (m1 & m3 | m4)]
vals = [s2, s3]
df['Hours'] = np.select(masks, vals, default=s)
print (df)
ID In Punch Out Punch Hours
0 1001 2019-07-28 08:27:25 2019-07-28 08:27:25 08:00:00
1 1002 2019-07-30 08:10:56 2019-07-30 19:48:28 11:37:32
2 1003 2019-05-08 19:44:12 2019-05-08 19:44:12 04:00:00
3 1004 2019-06-08 08:28:51 2019-06-08 19:47:21 11:18:30
4 1005 2019-08-25 08:03:50 2019-08-25 19:40:05 11:36:15
5 1006 2019-08-08 12:44:12 2019-08-08 12:44:12 04:00:00
答案 1 :(得分:0)
您需要将列的dtype转换为Pandas可以识别的用于执行日期时间算术的内容:
import pandas as pd
df['column_name'] = pd.to_datetime(df['column_name'])