熊猫到月的最后一天

时间:2018-10-25 17:56:25

标签: python pandas

给出一个日内数据的数据框:

                          Open       High        Low       Last     Volume  No. Trades   Close Bid  No. Bids   Close Ask  No. Asks
Timestamp                                                                                                                         
1996-01-02 09:30:00        NaN        NaN        NaN        NaN        NaN         NaN   61.375000       1.0   61.406250       1.0
1996-01-02 09:31:00   61.40625   61.40625   61.40625   61.40625     4100.0         1.0         NaN       NaN         NaN       NaN
1996-01-02 09:32:00   61.40625   61.40625   61.40625   61.40625      100.0         1.0   61.375000       2.0   61.406250       2.0
1996-01-02 09:33:00        NaN        NaN        NaN        NaN        NaN         NaN   61.406250       2.0   61.437500       2.0
1996-01-02 09:35:00        NaN        NaN        NaN        NaN        NaN         NaN   61.390625       1.0   61.421875       1.0

如何选择距月末n天的天数?我想使用groupby(),但是我不确定日内数据的处理方式,因为索引中不仅包含每日值。

def select_days(data, n_days, rtn = ''):

    ### select business days
    ts_days = pd.to_datetime(data.index.date)
    businessDays = pd.bdate_range(start=data.index[0].date(), end=data.index[-1].date())
    data = data[ts_days.isin(businessDays)]

    ### select T-n days     
    data[(data.index.days_in_month - data.index.day)==n_days]
    return data

3 个答案:

答案 0 :(得分:0)

要对具有多个月的数据集进行泛化,我将从该月末开始按天分组,然后选择所需的分组。诀窍是您不能选择特定日期的日期,因为最后一天的日期因月份而异。

from calendar import monthrange

def days_until_end(date):
    _, last_day = monthrange(date.year, date.month)
    return last_day - date.day

df.groupby(days_until_end).get_group(N)

N是您要定位的月底前的天数。

答案 1 :(得分:0)

这里不需要groupby。首先获得一个序列,该序列告诉您该特定行的月份:

days_in_month = df.index.daysinmonth

接下来获得一个月中的某天的系列

day_of_month = df.index.day

现在您可以轻松地做到:

df[(days_in_month - day_of_month).to_series().between(0, n_days)]

其中n_days是您的参数。

答案 2 :(得分:0)

熊猫技巧:

df['days_to_month_end'] = df.index.days_in_month - df.index.day
df[df.days_to_month_end==n]

或一行:

df[(df.index.days_in_month - df.index.day)==n]