有条件地在python

时间:2019-08-30 10:23:05

标签: python pandas dataframe timestamp timedelta

我想根据条件计算员工的工作时间,这是示例数据

df=pd.DataFrame({'ID':[1001,1002,1003,1004,1005,1006],'In Punch':['2019-07-28 08:27:25','30-07-2019  08:10:56','05-08-2019  19:44:12','06-08-2019  08:28:51','25-08-2019  08:03:50','08-08-2019  12:44:12'],'Out Punch':['2019-07-28 08:27:25','30-07-2019  19:48:28','05-08-2019  19:44:12','06-08-2019  19:47:21','25-08-2019  19:40:05','08-08-2019  12:44:12']})

我想要这样的输出

     ID    In Punch             Out Punch              Hours
0  1001    2019-07-28 08:27:25  2019-07-28 08:27:25    08:00:00
1  1002    2019-07-30 08:10:56  2019-07-30 19:48:28    11:37:32
2  1003    2019-05-08 19:44:12  2019-05-08 19:44:12    04:00:00
3  1004    2019-06-08 08:28:51  2019-06-08 19:47:21    11:18:30
4  1005    2019-08-25 08:03:50  2019-08-25 19:40:05    11:36:15
5  1006    2019-08-08 12:44:12  2019-08-08 12:44:12    04:00:00

df ['Hours']已创建,条件为

1。。如果df ['Out Punch']-df ['In Punch'] = 00:00:00, 然后检查df ['In Punch'],

如果df ['In Punch']早于或下午12.00 然后

df ['Hours'] = pd.Timedelta(8,unit ='H')(只需插入/更新8.00小时)。

否则,如果

df ['In Punch']在12:00和14.00 pm之间,

然后df ['Hours'] = pd.Timedelta(4,unit ='H')(插入/更新4.00小时)。

其他

df ['Hours'] = pd.Timedelta(4,unit ='H')(插入/更新4.00小时)。

2。。如果df ['Out Punch']-df ['In Punch']!= 00:00:00,

df ['Hours'] = df ['Out Punch']-df ['In Punch']

我尝试过这个

def create(df):
    if df['Out Punch'] - df['In Punch'] == pd.Timedelta(0):
        if pd.to_timedelta(df['In Punch']) <=  pd.Timedelta(12, unit='H'):          
            return pd.Timedelta(8, unit='H')      
        elif pd.to_timedelta(t['In Punch']) > pd.Timedelta(12, unit='H') | pd.to_timedelta(t['In Punch']) <= pd.Timedelta(14, unit='H'):
            return pd.Timedelta(4, unit='H')
        else:
            return pd.Timedelta(4, unit='H')
    else:
        df['Out Punch'] - df['In Punch']

df['Out Punch'] = pd.to_datetime(df['Out Punch']) ; df['In Punch'] = pd.to_datetime(df['In Punch'])

df['Hours'] = df.apply(create, axis=1)

但是它给出了错误

ValueError: ('Value must be Timedelta, string, integer, float, timedelta or convertible', 'occurred at index 0')

有什么建议吗?

2 个答案:

答案 0 :(得分:1)

使用numpy.select

#convert both columns to datetimes
df[['In Punch', 'Out Punch']]  = df[['In Punch', 'Out Punch']].apply(pd.to_datetime)

s = df['Out Punch'] - df['In Punch']

#convert times to timedeltas
td = pd.to_timedelta(df['In Punch'].dt.strftime('%H:%M:%S'))

#compare difference s and timedeltas td
m1 = s == pd.Timedelta(0)    
m2 = td <= pd.Timedelta(12, unit='H')
m3 = (td > pd.Timedelta(12, unit='H')) & (td <= pd.Timedelta(14, unit='H'))
m4 = td > pd.Timedelta(15, unit='H')

#output Series
s2 = td + pd.Timedelta(8, unit='H')
s3 = td + pd.Timedelta(4, unit='H')
s4 = td - pd.Timedelta(4, unit='H')

masks =[(m1 & m2), (m1 & m3), (m1 & m4)]
vals = [s2, s3, s4]

#set output by conditions
df['Hours'] = np.select(masks, vals, default=s)
print (df)
     ID            In Punch           Out Punch    Hours
0  1001 2019-07-28 08:27:25 2019-07-28 08:27:25 16:27:25
1  1002 2019-07-30 08:10:56 2019-07-30 19:48:28 11:37:32
2  1003 2019-05-08 19:44:12 2019-05-08 19:44:12 15:44:12
3  1004 2019-06-08 08:28:51 2019-06-08 19:47:21 11:18:30
4  1005 2019-08-25 08:03:50 2019-08-25 19:40:05 11:36:15
5  1006 2019-08-08 12:44:12 2019-08-08 12:44:12 16:44:12

编辑:

df[['In Punch', 'Out Punch']]  = df[['In Punch', 'Out Punch']].apply(pd.to_datetime)

s = df['Out Punch'] - df['In Punch']

td = pd.to_timedelta(df['In Punch'].dt.strftime('%H:%M:%S'))

m1 = s == pd.Timedelta(0)
m2 = td <= pd.Timedelta(12, unit='H')
m3 = (td > pd.Timedelta(12, unit='H')) & (td <= pd.Timedelta(14, unit='H'))
m4 = td > pd.Timedelta(15, unit='H')

s2 = np.timedelta64(8, 'h')
s3 = np.timedelta64(4, 'h')

masks =[(m1 & m2), (m1 & m3 | m4)]
vals = [s2, s3]

df['Hours'] = np.select(masks, vals, default=s)

print (df)
     ID            In Punch           Out Punch    Hours
0  1001 2019-07-28 08:27:25 2019-07-28 08:27:25 08:00:00
1  1002 2019-07-30 08:10:56 2019-07-30 19:48:28 11:37:32
2  1003 2019-05-08 19:44:12 2019-05-08 19:44:12 04:00:00
3  1004 2019-06-08 08:28:51 2019-06-08 19:47:21 11:18:30
4  1005 2019-08-25 08:03:50 2019-08-25 19:40:05 11:36:15
5  1006 2019-08-08 12:44:12 2019-08-08 12:44:12 04:00:00

答案 1 :(得分:0)

您需要将列的dtype转换为Pandas可以识别的用于执行日期时间算术的内容:

import pandas as pd
df['column_name'] = pd.to_datetime(df['column_name'])