我在pandas数据框中有两列,开始日期和结束日期。
我想知道每行的时间段中是否包含任何假期。
我想创建一个新列以显示是或否。
NodeRef
我知道如何检查特定日期是否是假期
但是我如何检查每一行的持续时间?
id Start Date End Date
0 2019-09-27 2019-10-06
1 2019-10-09 2019-10-22
2 2019-05-04 2019-05-15
3 2019-09-18 2019-09-29
我希望还有另一个布尔列要检查 如果每行(id)包含开始日期和结束日期之间的任何假期
import pandas as pd
from pandas.tseries.holiday import USFederalHolidayCalendar as calendar
df = pd.DataFrame({'Start Date':['2019-09-27', '2019-10-09', '2019-05-04', '2019-09-18'],
'End Date':['2019-10-06', '2019-10-22', '2019-05-15', '2019-09-29']})
# To check if a specific date is a holiday or not
holidays = calendar().holidays(start = df['Start Date'].min(), end = df['Start Date'].max())
df['Holiday'] = df['Start Date'].isin(holidays)
# This can only check if the start date is a holiday
id Start Date Holiday
0 2019-09-27 False
1 2019-10-09 False
2 2019-05-04 False
3 2019-09-18 False
# But how can I check the duration between df['Start Date'] and df['End Date'] of each row?
答案 0 :(得分:3)
我会做什么
#holidays = calendar().holidays(start = df['Start Date'].min(), end = df['End Date'].max())
l=[any(x<=z and y>=z for z in holidays.tolist()) for x , y in zip(df['Start Date'],df['End Date'])]
[False, True, False, False]
df['Holiday']=l
还要检查When should I ever want to use pandas apply() in my code?
答案 1 :(得分:1)
将检查功能应用于数据框的每一行:
df['Holiday'] = df.apply(lambda x:\
calendar().holidays(start=x['Start Date'],
end=x['End Date']).size, axis=1)\
.astype(bool) # Convert the answer to a boolean
#0 False
#1 True
#2 False
#3 False