有很多关于这个主题的帖子。我浏览了它们,但找不到问题的答案:
我正在研究 Pandas 时间序列 DataFrame。 DataFrame 数据位于每日时间范围内,我通过 Pandas 库 resample() 函数将其聚合到每周时间范围,如下所示。
daily_df = #daily time series dataframe
def aggregate(daily_df, frequency):
weekly_df = daily_df.resample(frequency, on='date').agg({'open':'first','high':'max', 'low':'min','close':'last','volume':'sum'})
df.reset_index(inplace=True)
return weekly_df
weekly_df = aggregate(daily_df, 'W-Fri')
我遇到的问题是,某个星期的时间序列数据只包含周一到周四的数据,但我不知道如何告诉 resample() 函数检查它,如果确实如此,在星期四而不是星期五结束一周; “W-周五”。
答案 0 :(得分:0)
由于 resample 函数没有该功能,我们可以通过添加天数标志并进行统计来确定一周内重采样的天数。
import yfinance as yf
daily_df = yf.download("AAPL", start="2020-11-01", end="2020-12-31")
def aggregate(daily_df, frequency):
daily_df.reset_index(inplace=True)
daily_df['days'] = 1
weekly_df = daily_df.resample(frequency, on='Date').agg({'Open':'first','High':'max', 'Low':'min','Close':'last','Volume':'sum','days':'count'})
return weekly_df
weekly_df = aggregate(daily_df, 'W-Fri')
weekly_df
Open High Low Close Volume days
Date
2020-11-06 109.110001 119.620003 107.320000 118.690002 609571800 5
2020-11-13 120.500000 121.989998 114.129997 119.260002 589577900 5
2020-11-20 118.919998 120.989998 116.809998 117.339996 389493400 5
2020-11-27 117.180000 117.620003 112.589996 116.589996 365024000 4
2020-12-04 116.970001 123.779999 116.809998 122.250000 543809200 5
2020-12-11 122.309998 125.949997 120.150002 122.410004 452278700 5
2020-12-18 122.599998 129.580002 121.540001 126.660004 621866700 5
2020-12-25 125.019997 134.410004 123.449997 131.970001 433310200 4
2021-01-01 133.990005 138.789993 133.399994 133.720001 341985600 3