如何过滤两个日期时间索引?

时间:2017-09-16 19:42:19

标签: python pandas

我有两个日期时间索引 - 一个是工作日date_range,另一个是假期列表。

我按开始日期和结束日期过滤假日列表。但现在我需要加入他们并删除任何重复项(假期和交易日都存在)。

最后,我需要将日期范围转换为格式化字符串列表,即:yyyy_mm_dd我可以稍后迭代。

到目前为止,这是我的代码:

import datetime
import pandas as pd
from pandas.tseries.holiday import AbstractHolidayCalendar, Holiday, nearest_workday, \
    USMartinLutherKingJr, USPresidentsDay, GoodFriday, USMemorialDay, \
    USLaborDay, USThanksgivingDay

class USTradingCalendar(AbstractHolidayCalendar):
    rules = [
        Holiday('NewYearsDay', month=1, day=1, observance=nearest_workday),
        USMartinLutherKingJr,
        USPresidentsDay,
        GoodFriday,
        USMemorialDay,
        Holiday('USIndependenceDay', month=7, day=4, observance=nearest_workday),
        USLaborDay,
        USThanksgivingDay,
        Holiday('Christmas', month=12, day=25, observance=nearest_workday)
    ]

def get_trading_close_holidays(year):
    inst = USTradingCalendar()  
    return inst.holidays(datetime.datetime(year-1, 12, 31), 
                         datetime.datetime(year, 12, 31))

start_date = "2017_07_01"
end_date = "2017_08_31"

start_date = datetime.datetime.strptime(start_date,"%Y_%m_%d").date()
end_date = datetime.datetime.strptime(end_date,"%Y_%m_%d").date()

date_range = pd.bdate_range(start = start_date, end = end_date, name = 
                            "trading_days")  
holidays = get_trading_close_holidays(start_date.year)
holidays = holidays.where((holidays.date > start_date) & 
                          (holidays.date < end_date))
holidays = holidays.dropna(how = 'any')
date_range = date_range.where(~(date_range.trading_days.isin(holidays)))

1 个答案:

答案 0 :(得分:0)

考虑按布尔条件过滤:

date_range = date_range[date_range.date != holidays.date]
print(date_range)   # ONE HOLIDAY 2017-07-04 DOES NOT APPEAR

# DatetimeIndex(['2017-07-03', '2017-07-05', '2017-07-06', '2017-07-07',
#                '2017-07-10', '2017-07-11', '2017-07-12', '2017-07-13',
#                '2017-07-14', '2017-07-17', '2017-07-18', '2017-07-19',
#                '2017-07-20', '2017-07-21', '2017-07-24', '2017-07-25',
#                '2017-07-26', '2017-07-27', '2017-07-28', '2017-07-31',
#                '2017-08-01', '2017-08-02', '2017-08-03', '2017-08-04',
#                '2017-08-07', '2017-08-08', '2017-08-09', '2017-08-10',
#                '2017-08-11', '2017-08-14', '2017-08-15', '2017-08-16',
#                '2017-08-17', '2017-08-18', '2017-08-21', '2017-08-22',
#                '2017-08-23', '2017-08-24', '2017-08-25', '2017-08-28',
#                '2017-08-29', '2017-08-30', '2017-08-31'],
#               dtype='datetime64[ns]', name='trading_days', freq=None)

使用astype()将日期时间索引转换为字符串类型数组,甚至tostring()进行列表转换:

strdates = date_range.date.astype('str').tolist()
print(strdates)

# ['2017-07-03', '2017-07-05', '2017-07-06', '2017-07-07', '2017-07-10', 
#  '2017-07-11', '2017-07-12', '2017-07-13', '2017-07-14', '2017-07-17', 
#  '2017-07-18', '2017-07-19', '2017-07-20', '2017-07-21', '2017-07-24', 
#  '2017-07-25', '2017-07-26', '2017-07-27', '2017-07-28', '2017-07-31', 
#  '2017-08-01', '2017-08-02', '2017-08-03', '2017-08-04', '2017-08-07', 
#  '2017-08-08', '2017-08-09', '2017-08-10', '2017-08-11', '2017-08-14', 
#  '2017-08-15', '2017-08-16', '2017-08-17', '2017-08-18', '2017-08-21', 
#  '2017-08-22', '2017-08-23', '2017-08-24', '2017-08-25', '2017-08-28', 
#  '2017-08-29', '2017-08-30', '2017-08-31']