Question

如果我做错了什么，我道歉但是我已经花了很长时间自己解决这个问题。我有一个巨大的数据集，其中包含大约3000种股票的历史价格（每日收盘价），相当于超过15万行。问题是我似乎无法重新采样数据而不会丢失大量数据。我的目标是保持每月关闭所有股票，同时保持数据的形状，包括股票代码，日期和关闭列。

wiki_prices_df = pd.read_csv('/gitHub/finance/PRICES_03_11_18.csv',usecols=['ticker','date','close'],parse_dates=['date'])

wiki_prices_df[:10]
    ticker  date    close
0   A   1999-11-18  44.00
1   A   1999-11-19  40.38
2   A   1999-11-22  44.00
3   A   1999-11-23  40.25
4   A   1999-11-24  41.06
5   A   1999-11-26  41.19
6   A   1999-11-29  42.13
7   A   1999-11-30  42.19
8   A   1999-12-01  42.94
9   A   1999-12-02  44.13

wiki_prices_df.shape
(15360208, 3)

＆＃39; date＆＃39;列已设置为时间戳数据。我的结果好坏参半，不知道我应该使用哪种或如何使用＆＃34; date_range＆＃34;或＆＃34; .resample＆＃34;将行数减少到只有月末。我试过这个

wiki_prices_df_monthly = wiki_prices_df.resample('M', on='date')
wiki_prices_df_monthly.shape
print(type(wiki_prices_df_monthly))

/anaconda/lib/python3.6/site-packages/ipykernel_launcher.py:2: FutureWarning: 
.resample() is now a deferred operation
You called shape(...) on this deferred object which materialized it into a dataframe
by implicitly taking the mean.  Use .resample(...).mean() instead

<class 'pandas.core.resample.DatetimeIndexResampler'>

Answer 1

听起来你要做的就是过滤，而不是重新采样。

wiki_prices_df["date"] = pd.to_datetime(wiki_prices_df["date"])
wiki_prices_df = wiki_prices_df[wiki_prices_df["date"].dt.is_month_end]

会删除除月末行之外的所有内容。

使用Pandas Resample

1 个答案: