范围

时间:2018-03-14 08:17:38

标签: python-2.7 amazon-web-services pyspark aws-glue

我需要获取给定范围内的假期列表,即,如果开始日期是20/12/2016&结束日期是10/1/2017,然后我应该得到2017年2月25日,1/1/2017。我可以使用Pandas做到这一点,但在我的情况下,我有限制,我需要AWS胶水服务& AWS Glue不支持Pandas。

我正在尝试使用本机python库假期,但我无法看到API文档来获取来自&的假期。到目前为止?

以下是我的尝试:

import holidays
import datetime
from datetime import date, timedelta
import dateutil
from dateutil.relativedelta import relativedelta

us_holidays = holidays.UnitedStates()

for date2,name in sorted(holidays.US(state='CA', years=2013).items()):
    print (date2,name)


days_from_closest_holiday = [(abs(fdate - hdate)).days for hdate in holidays.date.tolist()]
print days_from_closest_holiday

us_holidays的输出:

(datetime.date(2013, 1, 1), "New Year's Day")
(datetime.date(2013, 1, 21), 'Martin Luther King, Jr. Day')
(datetime.date(2013, 2, 18), "Washington's Birthday")
(datetime.date(2013, 3, 31), 'C\xc3\xa9sar Ch\xc3\xa1vez Day')
(datetime.date(2013, 4, 1), 'C\xc3\xa9sar Ch\xc3\xa1vez Day (Observed)')
(datetime.date(2013, 5, 27), 'Memorial Day')
(datetime.date(2013, 7, 4), 'Independence Day')
(datetime.date(2013, 9, 2), 'Labor Day')
(datetime.date(2013, 10, 14), 'Columbus Day')
(datetime.date(2013, 11, 11), 'Veterans Day')
(datetime.date(2013, 11, 28), 'Thanksgiving')
(datetime.date(2013, 12, 25), 'Christmas Day')

我需要将日期范围fromdate,todate传递给us_holidays对象,我希望在该范围内的假期列表,如下所示:

DatetimeIndex(['2013-12-25', '2014-01-01'], dtype='datetime64[ns]', freq=None)
[10, 17]

在熊猫中,我可以使用以下内容获取它:

cal = USFederalHolidayCalendar()
holidays = pd.to_datetime(cal.holidays(start_date, end_date))
print holidays

如上所述,我需要使用AWS Glue,目前Glue不支持Pandas。

请欣赏任何帮助。

由于

1 个答案:

答案 0 :(得分:0)

经过一段时间的探索,我自己得到了解决方案,在此分享作为解决方案参考:

%pyspark
import holidays
import datetime
from datetime import date, timedelta
import dateutil
from dateutil.relativedelta import relativedelta

us_holidays = holidays.UnitedStates()
custom_holidays = holidays.HolidayBase()
holidays_within_range=[]
fmt = '%Y-%m-%d'
holidays2013=[]

for date2,name in sorted(holidays.US(state='CA', years=2013).items()):
    holidays2013.append(date2.strftime(fmt))
print holidays2013

fdate = date(2013, 1, 1)

s_date = fdate - dateutil.relativedelta.relativedelta(days=7)
e_date = fdate + relativedelta(months=1)
start_date = s_date.strftime(fmt)
end_date = e_date.strftime(fmt)
print "Range : "
print start_date, end_date

dd = [s_date + timedelta(days=x) for x in range((e_date-s_date).days + 1)]
for d in dd:
    if(d in us_holidays):
        custom_holidays.append(d)
        holidays_within_range.append(d.strftime(fmt))

print holidays_within_range
days_from_closest_holiday = [(abs(fdate - datetime.datetime.strptime(hdate, fmt).date())).days for hdate in holidays_within_range]
print days_from_closest_holiday

以上的输出为:

['2013-01-01', '2013-01-21', '2013-02-18', '2013-03-31', '2013-04-01', '2013-05-27', '2013-07-04', '2013-09-02', '2013-10-14', '2013-11-11', '2013-11-28', '2013-12-25']
Range : 
2012-12-25 2013-02-01
['2012-12-25', '2013-01-01', '2013-01-21']
[7, 0, 20]

这不需要Pandas,我希望这适用于AWS Glue。如果没有更新变量名称,请根据需要进行更改。

由于