给出一个日内数据的数据框:
Open High Low Last Volume No. Trades Close Bid No. Bids Close Ask No. Asks
Timestamp
1996-01-02 09:30:00 NaN NaN NaN NaN NaN NaN 61.375000 1.0 61.406250 1.0
1996-01-02 09:31:00 61.40625 61.40625 61.40625 61.40625 4100.0 1.0 NaN NaN NaN NaN
1996-01-02 09:32:00 61.40625 61.40625 61.40625 61.40625 100.0 1.0 61.375000 2.0 61.406250 2.0
1996-01-02 09:33:00 NaN NaN NaN NaN NaN NaN 61.406250 2.0 61.437500 2.0
1996-01-02 09:35:00 NaN NaN NaN NaN NaN NaN 61.390625 1.0 61.421875 1.0
如何选择距月末n天的天数?我想使用groupby(),但是我不确定日内数据的处理方式,因为索引中不仅包含每日值。
def select_days(data, n_days, rtn = ''):
### select business days
ts_days = pd.to_datetime(data.index.date)
businessDays = pd.bdate_range(start=data.index[0].date(), end=data.index[-1].date())
data = data[ts_days.isin(businessDays)]
### select T-n days
data[(data.index.days_in_month - data.index.day)==n_days]
return data
答案 0 :(得分:0)
要对具有多个月的数据集进行泛化,我将从该月末开始按天分组,然后选择所需的分组。诀窍是您不能选择特定日期的日期,因为最后一天的日期因月份而异。
from calendar import monthrange
def days_until_end(date):
_, last_day = monthrange(date.year, date.month)
return last_day - date.day
df.groupby(days_until_end).get_group(N)
N
是您要定位的月底前的天数。
答案 1 :(得分:0)
这里不需要groupby
。首先获得一个序列,该序列告诉您该特定行的月份:
days_in_month = df.index.daysinmonth
接下来获得一个月中的某天的系列
day_of_month = df.index.day
现在您可以轻松地做到:
df[(days_in_month - day_of_month).to_series().between(0, n_days)]
其中n_days
是您的参数。
答案 2 :(得分:0)
熊猫技巧:
df['days_to_month_end'] = df.index.days_in_month - df.index.day
df[df.days_to_month_end==n]
或一行:
df[(df.index.days_in_month - df.index.day)==n]