我需要从日期列表中以大熊猫创建日历。该日历的类型必须为DateOffset,以便我可以将其传递给所有接受时间频率参数(例如date_range)的熊猫API。
输入内容如下:
dates = ['2011-01-01', '2011-01-02', '2011-01-03']
答案 0 :(得分:0)
这是一个很好的函数,可以根据给定的日期列表推断日历(pandas DateOffset)。您可以在所有接受时间频率参数的熊猫API中使用返回的日历。
import pandas as pd
from pandas.tseries.offsets import CustomBusinessDay
def infer_calendar(dates):
"""
Infer a calendar as pandas DateOffset from a list of dates.
Parameters
----------
dates : array-like (1-dimensional) or pd.DatetimeIndex
The dates you want to build a calendar from
Returns
-------
calendar : pd.DateOffset (CustomBusinessDay)
"""
dates = pd.DatetimeIndex(dates)
traded_weekdays = []
holidays = []
days_of_the_week = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
for day, day_str in enumerate(days_of_the_week):
weekday_mask = (dates.dayofweek == day)
# keep only days of the week that are present
if not weekday_mask.any():
continue
traded_weekdays.append(day_str)
# look for holidays
used_weekdays = dates[weekday_mask].normalize()
all_weekdays = pd.date_range(dates.min(), dates.max(),
freq=CustomBusinessDay(weekmask=day_str)
).normalize()
_holidays = all_weekdays.difference(used_weekdays)
_holidays = [timestamp.date() for timestamp in _holidays]
holidays.extend(_holidays)
traded_weekdays = ' '.join(traded_weekdays)
return CustomBusinessDay(weekmask=traded_weekdays, holidays=holidays)
以下是一些测试:
from pandas.tseries.offsets import BDay, Day
from pandas.tseries.holiday import USFederalHolidayCalendar
if __name__ == "__main__":
print("Test 1")
dates = ['2011-01-01', '2011-01-02', '2011-01-03', '2011-01-04',
'2011-01-05', '2011-01-06', '2011-01-07', '2011-01-08',
'2011-01-09', '2011-01-10', '2011-01-11', '2011-01-12',
'2011-01-13', '2011-01-14', '2011-01-15']
calendar = infer_calendar(dates)
print("dates:", dates)
print("calendar:", pd.date_range(dates[0], dates[-1], freq=calendar) )
print("Test 2")
dates = pd.DatetimeIndex(
['2011-01-01', '2011-01-02', '2011-01-03', '2011-01-04',
'2011-01-05', '2011-01-06', '2011-01-07', '2011-01-08',
'2011-01-09', '2011-01-10', '2011-01-11', '2011-01-12',
'2011-01-13', '2011-01-14', '2011-01-15']
)
calendar = infer_calendar(dates)
print("dates:", dates)
print("calendar:", pd.date_range(dates[0], dates[-1], freq=calendar) )
print("Test 3")
us_bd = CustomBusinessDay(calendar=USFederalHolidayCalendar())
dates = pd.DatetimeIndex(start='2011-01-01',end='2011-01-18', freq=us_bd)
calendar = infer_calendar(dates)
print("dates:", dates)
print("calendar:", pd.date_range(dates[0], dates[-1], freq=calendar) )
print("Test 4")
dates = pd.date_range('2011-01-01', '2011-01-15', freq=Day())
calendar = infer_calendar(dates)
print("dates:", dates)
print("calendar:", pd.date_range(dates[0], dates[-1], freq=calendar) )
print("Test 5")
dates = pd.date_range('2011-01-01', '2011-01-15', freq=BDay())
calendar = infer_calendar(dates)
print("dates:", dates)
print("calendar:", pd.date_range(dates[0], dates[-1], freq=calendar) )
结果:
Test 1
dates: ['2011-01-01', '2011-01-02', '2011-01-03', '2011-01-04', '2011-01-05', '2011-01-06', '2011-01-07', '2011-01-08', '2011-01-09', '2011-01-10', '2011-01-11', '2011-01-12', '2011-01-13', '2011-01-14', '2011-01-15']
calendar: DatetimeIndex(['2011-01-01', '2011-01-02', '2011-01-03', '2011-01-04',
'2011-01-05', '2011-01-06', '2011-01-07', '2011-01-08',
'2011-01-09', '2011-01-10', '2011-01-11', '2011-01-12',
'2011-01-13', '2011-01-14', '2011-01-15'],
dtype='datetime64[ns]', freq='C')
Test 2
dates: DatetimeIndex(['2011-01-01', '2011-01-02', '2011-01-03', '2011-01-04',
'2011-01-05', '2011-01-06', '2011-01-07', '2011-01-08',
'2011-01-09', '2011-01-10', '2011-01-11', '2011-01-12',
'2011-01-13', '2011-01-14', '2011-01-15'],
dtype='datetime64[ns]', freq=None)
calendar: DatetimeIndex(['2011-01-01', '2011-01-02', '2011-01-03', '2011-01-04',
'2011-01-05', '2011-01-06', '2011-01-07', '2011-01-08',
'2011-01-09', '2011-01-10', '2011-01-11', '2011-01-12',
'2011-01-13', '2011-01-14', '2011-01-15'],
dtype='datetime64[ns]', freq='C')
Test 3
dates: DatetimeIndex(['2011-01-03', '2011-01-04', '2011-01-05', '2011-01-06',
'2011-01-07', '2011-01-10', '2011-01-11', '2011-01-12',
'2011-01-13', '2011-01-14', '2011-01-18'],
dtype='datetime64[ns]', freq='C')
calendar: DatetimeIndex(['2011-01-03', '2011-01-04', '2011-01-05', '2011-01-06',
'2011-01-07', '2011-01-10', '2011-01-11', '2011-01-12',
'2011-01-13', '2011-01-14', '2011-01-18'],
dtype='datetime64[ns]', freq='C')
Test 4
dates: DatetimeIndex(['2011-01-01', '2011-01-02', '2011-01-03', '2011-01-04',
'2011-01-05', '2011-01-06', '2011-01-07', '2011-01-08',
'2011-01-09', '2011-01-10', '2011-01-11', '2011-01-12',
'2011-01-13', '2011-01-14', '2011-01-15'],
dtype='datetime64[ns]', freq='D')
calendar: DatetimeIndex(['2011-01-01', '2011-01-02', '2011-01-03', '2011-01-04',
'2011-01-05', '2011-01-06', '2011-01-07', '2011-01-08',
'2011-01-09', '2011-01-10', '2011-01-11', '2011-01-12',
'2011-01-13', '2011-01-14', '2011-01-15'],
dtype='datetime64[ns]', freq='C')
Test 5
dates: DatetimeIndex(['2011-01-03', '2011-01-04', '2011-01-05', '2011-01-06',
'2011-01-07', '2011-01-10', '2011-01-11', '2011-01-12',
'2011-01-13', '2011-01-14'],
dtype='datetime64[ns]', freq='B')
calendar: DatetimeIndex(['2011-01-03', '2011-01-04', '2011-01-05', '2011-01-06',
'2011-01-07', '2011-01-10', '2011-01-11', '2011-01-12',
'2011-01-13', '2011-01-14'],
dtype='datetime64[ns]', freq='C')