我想从数据集中获取滞后数据。数据集是每月的,如下所示:
Final Profits
JCCreateDate
2016-04-30 31163371.59
2016-05-31 27512300.34
...
2019-02-28 16800693.82
2019-03-31 5384227.13
现在从以上数据集中,我选择了一个数据窗口(最近12个月的数据),我想从中减去3、6、9和12个月。
我已经这样创建了窗口数据集:
df_all = pd.read_csv('dataset.csv')
df = pd.read_csv('window_dataset.csv')
data_start, data_end = pd.to_datetime(df.first_valid_index()), pd.to_datetime(df.last_valid_index())
dr = pd.date_range(data_start, data_end, freq='M')
现在对于日期范围dr
,我想减去月份,假设我从dr
减去3个月,然后尝试从df_all
检索数据
df_all.loc[dr - pd.DateOffset(months=3)]
这给了我以下输出
Final Profits
2018-01-30 NaN
2018-02-28 9240766.46
2018-03-30 NaN
2018-04-30 13250515.05
2018-05-31 12539224.15
2018-06-30 17778326.04
2018-07-31 19345671.02
2018-08-30 NaN
2018-09-30 14815607.14
2018-10-31 28979099.74
2018-11-28 NaN
2018-12-31 12395273.24
可以看到我有一些NaN,因为像Jan,Mar这样的月份有31天,而减法则是在寻找错误的月份。如何处理?
答案 0 :(得分:0)
我不是您要找的东西的100%,但我怀疑使用了班次。
# set up dataframe
index = pd.date_range(start='2016-04-30', end='2019-03-31', freq='M' )
df = pd.DataFrame(np.random.randint(5000000, 50000000, 36), index=index, columns=['Final Profits'])
# create three columns shifting and subtracing from 'Final_Profits'
df['3mos'] = df['Final Profits'] - df['Final Profits'].shift(3)
df['6mos'] = df['Final Profits'] - df['Final Profits'].shift(6)
df['9mos'] = df['Final Profits'] - df['Final Profits'].shift(9)
print(df.head(12))
Final Profits 3mos 6mos 9mos
2016-04-30 45197972 NaN NaN NaN
2016-05-31 5029292 NaN NaN NaN
2016-06-30 20310120 NaN NaN NaN
2016-07-31 10514197 -34683775.0 NaN NaN
2016-08-31 31219405 26190113.0 NaN NaN
2016-09-30 21504727 1194607.0 NaN NaN
2016-10-31 19234437 8720240.0 -25963535.0 NaN
2016-11-30 18881711 -12337694.0 13852419.0 NaN
2016-12-31 27237712 5732985.0 6927592.0 NaN
2017-01-31 21692788 2458351.0 11178591.0 -23505184.0
2017-02-28 7869701 -11012010.0 -23349704.0 2840409.0
2017-03-31 20943248 -6294464.0 -561479.0 633128.0