熊猫数据框中两个日期之间的营业时间(包括假期)

时间:2017-10-23 22:46:33

标签: python pandas date

新手Python用户 - 我正在尝试计算大熊猫DataFrame中两个日期之间的营业时间,上午9点至下午5点,周一至周五工作时间,并排除澳大利亚公共假期。

过去几天我试图破解很多解决方案并将其应用到我的问题中,但我遇到了很大的麻烦。

我会发布我当前的迭代,但也会寻找反馈作为处理这一整体的最佳方式,并了解如何在将来解决这些问题。

我最近的尝试是使用熊猫CDay然后为澳大利亚日期创建一个自定义假日日历,这些日历似乎都有效 - 然后从这一步开始将它应用到我难以理解的熊猫日期。我正在使用此https://codereview.stackexchange.com/questions/135142/calculate-working-minutes-between-two-timestamps/135200#135200解决方案中的自定义函数来计算日期之间的分钟数,但没有运气。

感谢任何帮助!

import datetime
from pandas.tseries.holiday import Holiday, AbstractHolidayCalendar
from pandas.tseries.offsets import CDay

class HolidayCalendar(AbstractHolidayCalendar):
    rules =[Holiday('New Years Day',year=2016,month=1,day=1),
        Holiday('Australia Day',year=2016,month=1,day=26),
        Holiday('Good Friday',year=2016,month=3,day=25),
        Holiday('Easter Monday',year=2016,month=3,day=28),
        Holiday('ANZAC Day',year=2016,month=4,day=25),
        Holiday('Queens Birthday',year=2016,month=6,day=13),
        Holiday('Christmas Day',year=2016,month=12,day=25),
        Holiday('Boxing Day',year=2016,month=12,day=26),           
        Holiday('New Years Day',year=2017,month=1,day=1),
        Holiday('Australia Day',year=2017,month=1,day=26),
        Holiday('Good Friday',year=2017,month=4,day=15),
        Holiday('Easter Monday',year=2017,month=4,day=17),
        Holiday('ANZAC Day',year=2017,month=4,day=25),
        Holiday('Queens Birthday',year=2017,month=6,day=12),
        Holiday('Christmas Day',year=2017,month=12,day=25),
        Holiday('Boxing Day',year=2017,month=12,day=26),
        Holiday('New Years Day',year=2018,month=1,day=1),
        Holiday('Australia Day',year=2018,month=1,day=26),
        Holiday('Good Friday',year=2018,month=3,day=30),
        Holiday('Easter Monday',year=2018,month=4,day=2),
        Holiday('ANZAC Day',year=2018,month=4,day=25),
        Holiday('Queens Birthday',year=2018,month=6,day=11),
        Holiday('Christmas Day',year=2018,month=12,day=25),
        Holiday('Boxing Day',year=2018,month=12,day=26)]

cal = HolidayCalendar()
dayindex = pd.bdate_range(datetime.date(2015,1,1),datetime.date.today(),freq=CDay(calendar=cal))

day_series = dayindex.to_series()

def count_mins(start,end):

starttime = datetime.datetime.fromtimestamp(int(start)/1000)

endtime = datetime.datetime.fromtimestamp(int(end)/1000)

days = day_series[starttime.date():endtime.date()]

daycount = len(days)

if daycount == 0:
    return daycount
else:
    startday = datetime.datetime(days[0].year,
                             days[0].month,
                             days[0].day,
                             hour=9,
                             minute=0)
    endday = datetime.datetime(days[-1].year,
                           days[-1].month,
                           days[-1].day,
                           hour=17,
                           minute=0)
    if daycount == 1:  

        if starttime < startday:
            periodstart = startday
        else:
            periodstart = starttime
        if endtime > endday:
            periodend = endday
        else:
            periodend = endtime

        return (periodend - periodstart).seconds/60

    if daycount == 2:

        if starttime < startday:
            first_day_mins = 480
        else:
            first_day_mins = (startday.replace(hour=17)-starttime).seconds/60
        if endtime > endday:
            second_day_mins = 480
        else:
            second_day_mins = (endtime-endday.replace(hour=9)).seconds/60

        return (first_day_mins + second_day_mins)

    else:

        if starttime < startday:
            first_day_mins = 480
        else:
            first_day_mins = (startday.replace(hour=17)-starttime).seconds/60
        if endtime > endday:
            second_day_mins = 480
        else:
            second_day_mins = (endtime-endday.replace(hour=9)).seconds/60

        return (first_day_mins + second_day_mins + ((daycount-2)*480))


df_updated['Created Date'] = pd.to_datetime(df_updated['Created Date'])
df_updated['Updated Date'] = pd.to_datetime(df_updated['Updated Date'])
df_updated['Created Date'] = df_updated['Created Date'].astype(np.int64) / 
int(1e6)
df_updated['Updated Date'] = df_updated['Updated Date'].astype(np.int64) / 
int(1e6)

count_mins(df_updated['Created Date'], df_updated['Updated Date'])

2 个答案:

答案 0 :(得分:0)

您可以使用bdate_range的长度:

In [11]: pd.bdate_range('2017-01-01', '2017-10-23')
Out[11]:
DatetimeIndex(['2017-01-02', '2017-01-03', '2017-01-04', '2017-01-05',
               '2017-01-06', '2017-01-09', '2017-01-10', '2017-01-11',
               '2017-01-12', '2017-01-13',
               ...
               '2017-10-10', '2017-10-11', '2017-10-12', '2017-10-13',
               '2017-10-16', '2017-10-17', '2017-10-18', '2017-10-19',
               '2017-10-20', '2017-10-23'],
              dtype='datetime64[ns]', length=211, freq='B')

In [12]: len(pd.bdate_range('2017-01-01', '2017-10-23'))
Out[12]: 211

答案 1 :(得分:0)

在PyPi中尝试这个名为business-duration的包

pip install business-duration

示例代码:

this.KeyPreview = true;