我试图在Python的statsmodels中运行Dickey-Fuller测试,但得到错误P. 从python 2.7&运行熊猫版本0.19.2。数据集来自Github并导入相同的
enter code here
from statsmodels.tsa.stattools import adfuller
def test_stationarity(timeseries):
print 'Results of Dickey-Fuller Test:'
dftest = ts.adfuller(timeseries, autolag='AIC' )
dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
for key,value in dftest[4].items():
dfoutput['Critical Value (%s)'%key] = value
print dfoutput
test_stationarity(tr)
给我以下错误:
Results of Dickey-Fuller Test:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-15-10ab4b87e558> in <module>()
----> 1 test_stationarity(tr)
<ipython-input-14-d779e1ed35b3> in test_stationarity(timeseries)
19 #Perform Dickey-Fuller test:
20 print 'Results of Dickey-Fuller Test:'
---> 21 dftest = ts.adfuller(timeseries, autolag='AIC' )
22 #dftest = adfuller(timeseries, autolag='AIC')
23 dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
C:\Users\SONY\Anaconda2\lib\site-packages\statsmodels\tsa\stattools.pyc in adfuller(x, maxlag, regression, autolag, store, regresults)
209
210 xdiff = np.diff(x)
--> 211 xdall = lagmat(xdiff[:, None], maxlag, trim='both', original='in')
212 nobs = xdall.shape[0] # pylint: disable=E1103
213
C:\Users\SONY\Anaconda2\lib\site-packages\statsmodels\tsa\tsatools.pyc in lagmat(x, maxlag, trim, original)
322 if x.ndim == 1:
323 x = x[:,None]
--> 324 nobs, nvar = x.shape
325 if original in ['ex','sep']:
326 dropidx = nvar
ValueError: too many values to unpack
答案 0 :(得分:9)
tr 必须是1d数组,如您所见here。在你的情况下,我不知道 tr 是什么。假设您将 tr 定义为包含时间系列数据的数据帧,您应该执行以下操作:
tr = tr.iloc[:,0].values
然后 adfuller 将能够读取数据。
答案 1 :(得分:2)
只需将行更改为:
dftest = adfuller(timeseries.iloc[:,0].values, autolag='AIC' )
它会起作用。 adfuller需要一维数组列表。在您的情况下,您有一个数据帧。因此,请提及列或编辑上述行。
答案 2 :(得分:2)
我假设你正在使用Dickey-Fuller测试。你想保留时间序列,即日期时间栏作为索引。所以为了做到这一点。
tr=tr.set_index('Month') #I am assuming here the time series column name is Month
ts = tr['othercoulumnname'] #Just use the other column name here it might be count or anything
我希望这会有所帮助。