pandas:在循环中减去timedelta

时间:2017-05-22 14:15:26

标签: python pandas

我有每日数据和一个定义一个月中每个第三个星期五的循环,然后在第三个星期五的20天内将列的值更改为2。但是,标记仅适用于第三个星期五之后的日子。我不明白为什么。 我的数据框"合并了#34;如下:

Date         ID    Window
01/01/2000   1        0
01/01/2000   1        0
02/01/2000   2        0
02/01/2000   2        0

目前的代码如下:

#Get third friday in a month Friday:

c = calendar.Calendar(firstweekday=calendar.SUNDAY)
year = 2000; month = 3
monthcal = c.monthdatescalendar(year,month)
third_friday = [day for week in monthcal for day in week if \
            day.weekday() == calendar.FRIDAY and \
            day.month == month][2]

#Loop through dates to change the window column:

for beg in pd.date_range("2000-01-01", "2017-05-01"): 
    beg= third_friday
         merged["window"].loc[beg: beg + pd.to_timedelta(20,"D")] = 2
         merged["window"].loc[beg: beg - pd.to_timedelta(20,"D")] = 2

#repeat the same for the next Fridays:
    if month==12:
       year=year+1
       month=0
    if year>=2017 and month>=3:
       break
    month = month +3
    monthcal = c.monthdatescalendar(year,month)
    third_friday = [day for week in monthcal for day in week if \
                day.weekday() == calendar.FRIDAY and \
                day.month == month][2] 

当我运行此代码时,我没有在第三个星期五之前将窗口列设置为2。只有在第三个星期五之后的20天才变为2.有人知道我做错了什么吗?

1 个答案:

答案 0 :(得分:1)

本月的第三个星期五

最简单的方法是定义一个方法来计算一个月的第三个星期五,给定一年和一个月。要么将您的方法与calendar一起使用,要么类似的方法也可以使用

def third_friday_of(year, month):
    pd.DatetimeIndex(start = '%i/%i/15' % (year, month, ), end='%i/%i/21' % (year, month, ), freq='d')
    return daterange[daterange.weekday == 4][0]

这会返回pandas.Timestamp,但这是datetime.datetime的子类,所以不应该在你的程序中造成进一步的问题

实际计算

我还定义了一个单独的方法来实际更改DataFrame,并将间隔和窗口作为参数

def process_dataframe(df, begin_year, begin_month, end_year, end_month, interval_months=3, window=20):
    end_month = min(end_month + 1, 12)
    dates = pd.DatetimeIndex(start = '%i/%i' % ( begin_year, begin_month,), end='%i/%i' % (end_year, end_month), freq='%im' % interval_months)
    for d in dates:
        third_friday = third_friday_of(d.year, d.month)
#         print(d, third_friday)
        df.loc[third_friday - pd.Timedelta(window, unit='d') : third_friday 2 pd.Timedelta(window, unit='d'), 'Window'] = 2

它可能不适合你的原因是merged["window"].loc[beg: beg - pd.to_timedelta(20,"D")] = 2应该是merged["window"].loc[beg - pd.to_timedelta(20,"D"):beg] = 2

链式分配

merged["window"].loc[beg: beg + pd.to_timedelta(20,"D")] = 2本身有第二个问题。使用merged["window"],您需要一个系列,但无论是获得视图还是副本,它都不是100%明确或确定的。最好是在.loc中执行此操作,就像在我的代码中一样