我想使用Python 2.7 pandas编写m_tax的滚动平均代码来分析来自网页的时间序列数据(http://owww.met.hu/eghajlat/eghajlati_adatsorok/bp/Navig/202_EN.htm)。
datum m_ta m_tax m_taxd m_tan m_tand
------- ----- ----- ---------- ----- ----------
1901-01 -4.7 5.0 1901-01-23 -12.2 1901-01-10
1901-02 -2.1 3.5 1901-02-06 -7.9 1901-02-15
1901-03 5.8 13.5 1901-03-20 0.6 1901-03-01
1901-04 11.6 18.2 1901-04-10 7.4 1901-04-23
1901-05 16.8 22.5 1901-05-31 12.2 1901-05-05
1901-06 21.0 24.8 1901-06-03 14.6 1901-06-17
1901-07 22.4 27.4 1901-07-30 16.9 1901-07-04
1901-08 20.7 25.9 1901-08-01 14.7 1901-08-29
....
在这里,我尝试了我的代码:
pd.rolling_mean(df.resample("1M", fill_method="ffill"), window=60, min_periods=1, center=True).mean()
我得到了结果:
m_ta 11.029173
m_tax 17.104283
m_tan 4.848637
month 6.499500
monthly_mean 11.030405
monthly_std 1.836159
m_tax% 0.083348
m_tan% 0.023627
dtype: float64
另一种方式我试过:
s = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/1900', periods=1000))
s = s.cumsum()
r = s.rolling(window=60)
r.mean()
我得到了结果
1900-01-01 NaN
1900-01-02 NaN
1900-01-03 NaN
1900-01-04 NaN
1900-01-05 NaN
1900-01-06 NaN
1900-01-07 NaN
1900-01-08 NaN
...
所以我在这里很困惑。我应该使用哪一个?有人可以给我一点想法吗?谢谢!
答案 0 :(得分:0)
从版本0.18.0开始,rolling()
和resample()
都是与groupby()
类似的行为,并且不作为函数弃用。
What's new in pandas version 0.18.0
rolling()/expanding() in pandas version 0.18.0
resample() in pandas version 0.18.0
我无法确切地说出你想要的结果是什么,但也许这样的东西是你想要的? (你可以看到下面的警告信息,虽然我不确定是什么触发它。)
>>> df
m_ta m_tax m_taxd m_tan m_tand
datum
1901-01-01 -4.7 5.0 1901-01-23 -12.2 1901-01-10
1901-02-01 -2.1 3.5 1901-02-06 -7.9 1901-02-15
1901-03-01 5.8 13.5 1901-03-20 0.6 1901-03-01
1901-04-01 11.6 18.2 1901-04-10 7.4 1901-04-23
1901-05-01 16.8 22.5 1901-05-31 12.2 1901-05-05
1901-06-01 21.0 24.8 1901-06-03 14.6 1901-06-17
1901-07-01 22.4 27.4 1901-07-30 16.9 1901-07-04
1901-08-01 20.7 25.9 1901-08-01 14.7 1901-08-29
>>> df.resample("1M").rolling(3,center=True,min_periods=1).mean()
/Users/john/anaconda/lib/python3.5/site-packages/ipykernel/__main__.py:1: FutureWarning: .resample() is now a deferred operation
use .resample(...).mean() instead of .resample(...)
if __name__ == '__main__':
m_ta m_tax m_tan
datum
1901-01-31 -3.400000 4.250000 -10.050000
1901-02-28 -0.333333 7.333333 -6.500000
1901-03-31 5.100000 11.733333 0.033333
1901-04-30 11.400000 18.066667 6.733333
1901-05-31 16.466667 21.833333 11.400000
1901-06-30 20.066667 24.900000 14.566667
1901-07-31 21.366667 26.033333 15.400000
1901-08-31 21.550000 26.650000 15.800000