从日期列表中使用Pandas创建日历(DateOffset)

时间:2019-01-03 14:02:23

标签: python pandas

我需要从日期列表中以大熊猫创建日历。该日历的类型必须为DateOffset,以便我可以将其传递给所有接受时间频率参数(例如date_range)的熊猫API。

输入内容如下:

dates = ['2011-01-01', '2011-01-02', '2011-01-03']

1 个答案:

答案 0 :(得分:0)

这是一个很好的函数,可以根据给定的日期列表推断日历(pandas DateOffset)。您可以在所有接受时间频率参数的熊猫API中使用返回的日历。

import pandas as pd
from pandas.tseries.offsets import CustomBusinessDay

def infer_calendar(dates):
    """
    Infer a calendar as pandas DateOffset from a list of dates.
    Parameters
    ----------
    dates : array-like (1-dimensional) or pd.DatetimeIndex
        The dates you want to build a calendar from
    Returns
    -------
    calendar : pd.DateOffset (CustomBusinessDay)
    """
    dates = pd.DatetimeIndex(dates)

    traded_weekdays = []
    holidays = []

    days_of_the_week = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
    for day, day_str in enumerate(days_of_the_week):

        weekday_mask = (dates.dayofweek == day)

        # keep only days of the week that are present
        if not weekday_mask.any():
            continue
        traded_weekdays.append(day_str)

        # look for holidays
        used_weekdays = dates[weekday_mask].normalize()
        all_weekdays = pd.date_range(dates.min(), dates.max(),
                                     freq=CustomBusinessDay(weekmask=day_str)
                                     ).normalize()
        _holidays = all_weekdays.difference(used_weekdays)
        _holidays = [timestamp.date() for timestamp in _holidays]
        holidays.extend(_holidays)

    traded_weekdays = ' '.join(traded_weekdays)
    return CustomBusinessDay(weekmask=traded_weekdays, holidays=holidays)

以下是一些测试:

from pandas.tseries.offsets import BDay, Day
from pandas.tseries.holiday import USFederalHolidayCalendar

if __name__ == "__main__":
    print("Test 1")
    dates = ['2011-01-01', '2011-01-02', '2011-01-03', '2011-01-04',
             '2011-01-05', '2011-01-06', '2011-01-07', '2011-01-08',
             '2011-01-09', '2011-01-10', '2011-01-11', '2011-01-12',
             '2011-01-13', '2011-01-14', '2011-01-15']
    calendar = infer_calendar(dates)
    print("dates:", dates)
    print("calendar:", pd.date_range(dates[0], dates[-1], freq=calendar) )

    print("Test 2")
    dates = pd.DatetimeIndex(
             ['2011-01-01', '2011-01-02', '2011-01-03', '2011-01-04',
             '2011-01-05', '2011-01-06', '2011-01-07', '2011-01-08',
             '2011-01-09', '2011-01-10', '2011-01-11', '2011-01-12',
             '2011-01-13', '2011-01-14', '2011-01-15']
             )
    calendar = infer_calendar(dates)
    print("dates:", dates)
    print("calendar:", pd.date_range(dates[0], dates[-1], freq=calendar) )

    print("Test 3")
    us_bd = CustomBusinessDay(calendar=USFederalHolidayCalendar())
    dates = pd.DatetimeIndex(start='2011-01-01',end='2011-01-18', freq=us_bd)
    calendar = infer_calendar(dates)
    print("dates:", dates)
    print("calendar:", pd.date_range(dates[0], dates[-1], freq=calendar) )

    print("Test 4")
    dates = pd.date_range('2011-01-01', '2011-01-15', freq=Day())
    calendar = infer_calendar(dates)
    print("dates:", dates)
    print("calendar:", pd.date_range(dates[0], dates[-1], freq=calendar) )

    print("Test 5")
    dates = pd.date_range('2011-01-01', '2011-01-15', freq=BDay())
    calendar = infer_calendar(dates)
    print("dates:", dates)
    print("calendar:", pd.date_range(dates[0], dates[-1], freq=calendar) )

结果:

Test 1
dates: ['2011-01-01', '2011-01-02', '2011-01-03', '2011-01-04', '2011-01-05', '2011-01-06', '2011-01-07', '2011-01-08', '2011-01-09', '2011-01-10', '2011-01-11', '2011-01-12', '2011-01-13', '2011-01-14', '2011-01-15']
calendar: DatetimeIndex(['2011-01-01', '2011-01-02', '2011-01-03', '2011-01-04',
               '2011-01-05', '2011-01-06', '2011-01-07', '2011-01-08',
               '2011-01-09', '2011-01-10', '2011-01-11', '2011-01-12',
               '2011-01-13', '2011-01-14', '2011-01-15'],
              dtype='datetime64[ns]', freq='C')
Test 2
dates: DatetimeIndex(['2011-01-01', '2011-01-02', '2011-01-03', '2011-01-04',
               '2011-01-05', '2011-01-06', '2011-01-07', '2011-01-08',
               '2011-01-09', '2011-01-10', '2011-01-11', '2011-01-12',
               '2011-01-13', '2011-01-14', '2011-01-15'],
              dtype='datetime64[ns]', freq=None)
calendar: DatetimeIndex(['2011-01-01', '2011-01-02', '2011-01-03', '2011-01-04',
               '2011-01-05', '2011-01-06', '2011-01-07', '2011-01-08',
               '2011-01-09', '2011-01-10', '2011-01-11', '2011-01-12',
               '2011-01-13', '2011-01-14', '2011-01-15'],
              dtype='datetime64[ns]', freq='C')
Test 3
dates: DatetimeIndex(['2011-01-03', '2011-01-04', '2011-01-05', '2011-01-06',
               '2011-01-07', '2011-01-10', '2011-01-11', '2011-01-12',
               '2011-01-13', '2011-01-14', '2011-01-18'],
              dtype='datetime64[ns]', freq='C')
calendar: DatetimeIndex(['2011-01-03', '2011-01-04', '2011-01-05', '2011-01-06',
               '2011-01-07', '2011-01-10', '2011-01-11', '2011-01-12',
               '2011-01-13', '2011-01-14', '2011-01-18'],
              dtype='datetime64[ns]', freq='C')
Test 4
dates: DatetimeIndex(['2011-01-01', '2011-01-02', '2011-01-03', '2011-01-04',
               '2011-01-05', '2011-01-06', '2011-01-07', '2011-01-08',
               '2011-01-09', '2011-01-10', '2011-01-11', '2011-01-12',
               '2011-01-13', '2011-01-14', '2011-01-15'],
              dtype='datetime64[ns]', freq='D')
calendar: DatetimeIndex(['2011-01-01', '2011-01-02', '2011-01-03', '2011-01-04',
               '2011-01-05', '2011-01-06', '2011-01-07', '2011-01-08',
               '2011-01-09', '2011-01-10', '2011-01-11', '2011-01-12',
               '2011-01-13', '2011-01-14', '2011-01-15'],
              dtype='datetime64[ns]', freq='C')
Test 5
dates: DatetimeIndex(['2011-01-03', '2011-01-04', '2011-01-05', '2011-01-06',
               '2011-01-07', '2011-01-10', '2011-01-11', '2011-01-12',
               '2011-01-13', '2011-01-14'],
              dtype='datetime64[ns]', freq='B')
calendar: DatetimeIndex(['2011-01-03', '2011-01-04', '2011-01-05', '2011-01-06',
               '2011-01-07', '2011-01-10', '2011-01-11', '2011-01-12',
               '2011-01-13', '2011-01-14'],
              dtype='datetime64[ns]', freq='C')