处理大熊猫中的时间序列数据

时间:2017-09-06 09:37:38

标签: python pandas dataframe time-series

我有每日时间序列数据。我试图通过乘以一些月度因素来纠正这些时间序列值。它就像手动校正一样:

我的时间序列数据如下:

model:

2010-01-30    0.008909
2010-01-31    0.007562
2010-02-01    0.012377
2010-02-02    0.010286
2010-02-03    0.012244
2010-02-04    0.011367
2010-02-05    0.010800
2010-02-06    0.007610
2010-02-07    0.006534
2010-02-08    0.004721
                ...   
2015-12-02    0.005415
2015-12-03    0.004358
2015-12-04    0.006844
2015-12-05    0.002373

我每月有一些因素:

mon_slope:

month
January     -0.168627
February    -0.165102
March       -0.112321
April       -0.112232
May         -0.080092
June        -0.129905
July        -0.078751
August      -0.095756
September   -0.090188
October     -0.109919
November    -0.155380
December    -0.137885
Name: slope, dtype: float64

我做了什么:

jan_corr = pd.DataFrame(model[model.index.month ==1]*mon_slope.ix[0][1])
feb_corr = pd.DataFrame(model[model.index.month ==2]*mon_slope.ix[1][1])
mar_corr = pd.DataFrame(model[model.index.month ==3]*mon_slope.ix[2][1])

..................
..................

final = pd.concat([jan_corr,feb_corr,mar_corr])

但我确信这不是正确的做法。有没有更简单的方法来做到这一点:

1 个答案:

答案 0 :(得分:1)

首先,创建一个映射:

mapping = dict(months.values)
mapping

{'April': -0.112232,
 'August': -0.095756,
 'December': -0.13788499999999998,
 'February': -0.165102,
 'January': -0.168627,
 'July': -0.078751,
 'June': -0.129905,
 'March': -0.112321,
 'May': -0.080092,
 'November': -0.15538,
 'October': -0.109919,
 'September': -0.090188}

您可以使用Series.dt.strftime检索月名:

df.iloc[:, 0].dt.strftime('%B')
Out[143]: 
0      January
1      January
2     February
3     February
4     February
5     February
6     February
7     February
8     February
9     February
10    December
11    December
12    December
13    December
Name: 0, dtype: object

现在,使用此功能访问df.replacedf.map

的乘数
df.iloc[:, 1] = df.iloc[:, 0].dt.strftime('%B').map(mapping) * df.iloc[:, 1]
df

            0         1
0  2010-01-30 -0.001502
1  2010-01-31 -0.001275
2  2010-02-01 -0.002043
3  2010-02-02 -0.001698
4  2010-02-03 -0.002022
5  2010-02-04 -0.001877
6  2010-02-05 -0.001783
7  2010-02-06 -0.001256
8  2010-02-07 -0.001079
9  2010-02-08 -0.000779
10 2015-12-02 -0.000747
11 2015-12-03 -0.000601
12 2015-12-04 -0.000944
13 2015-12-05 -0.000327

详细说明:

df

            0         1
0  2010-01-30  0.008909
1  2010-01-31  0.007562
2  2010-02-01  0.012377
3  2010-02-02  0.010286
4  2010-02-03  0.012244
5  2010-02-04  0.011367
6  2010-02-05  0.010800
7  2010-02-06  0.007610
8  2010-02-07  0.006534
9  2010-02-08  0.004721
10 2015-12-02  0.005415
11 2015-12-03  0.004358
12 2015-12-04  0.006844
13 2015-12-05  0.002373

months

            0         1
0     January -0.168627
1    February -0.165102
2       March -0.112321
3       April -0.112232
4         May -0.080092
5        June -0.129905
6        July -0.078751
7      August -0.095756
8   September -0.090188
9     October -0.109919
10   November -0.155380
11   December -0.137885