使用Numpy和Timedelta

时间:2017-11-08 13:40:59

标签: python pandas numpy time timedelta

我正在根据df的两列,inc_cr_date_day和inc_cr_date创建一个df的新列[' inc_cr_date_adjusted']但是下面的代码没有按预期工作。不会给出错误,但它不能按条件定义,只有周六和周日的条件才能正常工作。包含小时和时间的条件时间不起作用,因为日子没有总结(Timedelta(' 1天')。 其他问题是,大多数日期应该采用最后一行代码" df [' inc_cr_date'])"并且根本不改变,但那也没有改变。

我的条件是基于星期几和小时&分钟inc_cr_date中可见的分钟。小时和分钟是9:30和18:30,使用&。

进行划分

代码是:

df['inc_cr_date_day'] = df['inc_cr_date'].dt.weekday_name

df['inc_cr_date_adjusted'] = np.select([(df['inc_cr_date_day'] == 'Saturday'),#condition working
                              (df['inc_cr_date_day'] == 'Sunday'),#condition working
                              ((df['inc_cr_date_day'] == 'Monday')& (df['inc_cr_date'].dt.hour > 18 ) & df['inc_cr_date'].dt.minute > 30),
                              ((df['inc_cr_date_day'] == 'Monday')& (df['inc_cr_date'].dt.hour < 9 ) & df['inc_cr_date'].dt.minute < 30),
                              ((df['inc_cr_date_day'] == 'Tuesday')& (df['inc_cr_date'].dt.hour > 18 ) & df['inc_cr_date'].dt.minute > 30),
                              ((df['inc_cr_date_day'] == 'Tuesday')& (df['inc_cr_date'].dt.hour < 9 ) & df['inc_cr_date'].dt.minute < 30),
                              ((df['inc_cr_date_day'] == 'Wednesday')& (df['inc_cr_date'].dt.hour > 18 ) & df['inc_cr_date'].dt.minute > 30),
                              ((df['inc_cr_date_day'] == 'Wednesday')& (df['inc_cr_date'].dt.hour < 9 ) & df['inc_cr_date'].dt.minute < 30),
                              ((df['inc_cr_date_day'] == 'Thursday')& (df['inc_cr_date'].dt.hour > 18 ) & df['inc_cr_date'].dt.minute > 30),
                              ((df['inc_cr_date_day'] == 'Thursday')& (df['inc_cr_date'].dt.hour < 9 ) & df['inc_cr_date'].dt.minute < 30),
                              ((df['inc_cr_date_day'] == 'Friday')& (df['inc_cr_date'].dt.hour > 18 ) & df['inc_cr_date'].dt.minute > 30),
                              ((df['inc_cr_date_day'] == 'Friday')& (df['inc_cr_date'].dt.hour < 9 ) & df['inc_cr_date'].dt.minute < 30)],


                           [(df['inc_cr_date']+pd.Timedelta('2 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes'),
                            (df['inc_cr_date']+pd.Timedelta('1 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes'),
                            (df['inc_cr_date']+pd.Timedelta('1 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes'),
                            (df['inc_cr_date']+pd.Timedelta('0 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes'),
                            (df['inc_cr_date']+pd.Timedelta('1 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes'),                            
                            (df['inc_cr_date']+pd.Timedelta('0 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes'),
                            (df['inc_cr_date']+pd.Timedelta('1 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes'),
                            (df['inc_cr_date']+pd.Timedelta('0 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes'),
                            (df['inc_cr_date']+pd.Timedelta('1 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes'),
                            (df['inc_cr_date']+pd.Timedelta('0 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes'),
                            (df['inc_cr_date']+pd.Timedelta('3 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes'),
                            (df['inc_cr_date']+pd.Timedelta('0 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes')],

                           df['inc_cr_date'])

输出(错误):

inc_cr_date,inc_cr_date_day,inc_cr_date_adjusted
2017-10-26 21:59:28.075,Thursday,2017-10-26 09:30:00.000 #nok, adjusted should be 2017-10-27 and not 2017-10-26.
2017-10-21 16:49:58.722,Saturday,2017-10-23 09:30:00.000 #ok
2017-10-11 09:30:05.258,Wednesday,2017-10-11 09:30:00.000 #nok, in such situation the adjusted date should be same as inc_cr_date

非常感谢您的投入。

1 个答案:

答案 0 :(得分:1)

作为程序员,我们应尽量减少尽可能多的重复(遵循DRY原则)。我们可以利用.isin来获得您想要的结果,即

#All the condtions can be reduced to one mask and result 
days_one = ['Monday','Tuesday','Wednesday','Thursday']
days_two = days_one + ['Friday']

# Returns a boolean mask 
m1 = df['inc_cr_date_day'].isin(days_one) & (df['inc_cr_date'].dt.hour > 18 ) & (df['inc_cr_date'].dt.minute > 30)
m2 = df['inc_cr_date_day'].isin(days_two) & (df['inc_cr_date'].dt.hour < 9 ) & (df['inc_cr_date'].dt.minute < 30)

# Repeated result can be stored in one variable 
r1 = (df['inc_cr_date']+pd.Timedelta('1 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes')
r2 = (df['inc_cr_date']+pd.Timedelta('0 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes')


df['inc_cr_date_adjusted'] = np.select([
                          m1, m2,      
                          (df['inc_cr_date_day'] == 'Saturday'),
                          (df['inc_cr_date_day'] == 'Sunday'),
                          ((df['inc_cr_date_day'] == 'Friday')& (df['inc_cr_date'].dt.hour > 18 ) & df['inc_cr_date'].dt.minute > 30),
                          ],
                          [r1, r2,
                          (df['inc_cr_date']+pd.Timedelta('2 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes'),
                          (df['inc_cr_date']+pd.Timedelta('1 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes'),                           
                          (df['inc_cr_date']+pd.Timedelta('3 days')).dt.normalize() + pd.Timedelta('9 Hours 30 Minutes')
                          ],
                          df['inc_cr_date'])

输出:

               inc_cr_date inc_cr_date_day    inc_cr_date_adjusted
0 2017-10-26 21:59:28.075        Thursday 2017-10-27 09:30:00.000
1 2017-10-21 16:49:58.722        Saturday 2017-10-23 09:30:00.000
2 2017-10-11 09:30:05.258       Wednesday 2017-10-11 09:30:05.258

通常,由于多次匹配,许多条件会产生歧义。希望上面的代码可以帮助您获得deisred结果。当我查看你的代码时,条件的优先顺序也可能很重要,所以尽量把括号括在最终条件中,即

((df['inc_cr_date_day'] == 'Monday')& (df['inc_cr_date'].dt.hour > 18 ) & (df['inc_cr_date'].dt.minute > 30))