获取大熊猫中的滞后数据

时间:2019-04-14 07:28:20

标签: python pandas

我想从数据集中获取滞后数据。数据集是每月的,如下所示:

           Final Profits
JCCreateDate    
2016-04-30  31163371.59
2016-05-31  27512300.34
...
2019-02-28  16800693.82
2019-03-31  5384227.13

现在从以上数据集中,我选择了一个数据窗口(最近12个月的数据),我想从中减去3、6、9和12个月。

我已经这样创建了窗口数据集:

df_all = pd.read_csv('dataset.csv')
df = pd.read_csv('window_dataset.csv')
data_start, data_end = pd.to_datetime(df.first_valid_index()), pd.to_datetime(df.last_valid_index())
dr = pd.date_range(data_start, data_end, freq='M')

现在对于日期范围dr,我想减去月份,假设我从dr减去3个月,然后尝试从df_all检索数据

df_all.loc[dr - pd.DateOffset(months=3)]

这给了我以下输出

            Final Profits
2018-01-30  NaN
2018-02-28  9240766.46
2018-03-30  NaN
2018-04-30  13250515.05
2018-05-31  12539224.15
2018-06-30  17778326.04
2018-07-31  19345671.02
2018-08-30  NaN
2018-09-30  14815607.14
2018-10-31  28979099.74
2018-11-28  NaN
2018-12-31  12395273.24

可以看到我有一些NaN,因为像Jan,Mar这样的月份有31天,而减法则是在寻找错误的月份。如何处理?

1 个答案:

答案 0 :(得分:0)

我不是您要找的东西的100%,但我怀疑使用了班次。

# set up dataframe
index = pd.date_range(start='2016-04-30', end='2019-03-31', freq='M' )
df = pd.DataFrame(np.random.randint(5000000, 50000000, 36), index=index, columns=['Final Profits'])

# create three columns shifting and subtracing from 'Final_Profits'
df['3mos'] = df['Final Profits'] - df['Final Profits'].shift(3)
df['6mos'] = df['Final Profits'] - df['Final Profits'].shift(6)
df['9mos'] = df['Final Profits'] - df['Final Profits'].shift(9)

print(df.head(12))

         Final Profits        3mos        6mos        9mos
2016-04-30       45197972         NaN         NaN         NaN
2016-05-31        5029292         NaN         NaN         NaN
2016-06-30       20310120         NaN         NaN         NaN
2016-07-31       10514197 -34683775.0         NaN         NaN
2016-08-31       31219405  26190113.0         NaN         NaN
2016-09-30       21504727   1194607.0         NaN         NaN
2016-10-31       19234437   8720240.0 -25963535.0         NaN
2016-11-30       18881711 -12337694.0  13852419.0         NaN
2016-12-31       27237712   5732985.0   6927592.0         NaN
2017-01-31       21692788   2458351.0  11178591.0 -23505184.0
2017-02-28        7869701 -11012010.0 -23349704.0   2840409.0
2017-03-31       20943248  -6294464.0   -561479.0    633128.0