使用Python使ARIMA模型的数据固定

时间:2019-11-26 14:13:43

标签: python dataframe time-series forecasting arima

我正在使用ARIMA模型来预测未来的时间序列值。在此之前,我需要使数据季节性自由,平稳和下降。我阅读了许多有关制作数据文具的文章。而且,到目前为止,我已经编写了以下代码,但仍然无法消除季节性和平稳性。数据示例如下,

        DATE             X
 1992-01-01 03:00:00    10.2
 1992-01-01 06:00:00    10.4
 1992-01-01 09:00:00    11.8
 1992-01-01 12:00:00    12.0
 1992-01-01 15:00:00    10.4
 1992-01-01 18:00:00    9.4
 1992-01-01 21:00:00    10.4
 1992-01-02 00:00:00    13.6
 1992-01-02 03:00:00    13.2
 1992-01-02 06:00:00    11.8
 1992-01-02 09:00:00    12.0
 1992-01-02 12:00:00    12.8
 1992-01-02 15:00:00    12.6
 1992-01-02 18:00:00    11.0
 1992-01-02 21:00:00    12.2
 1992-01-03 00:00:00    13.8
 1992-01-03 03:00:00    14.0
 1992-01-03 06:00:00    13.4
 1992-01-03 09:00:00    14.2
 1992-01-03 12:00:00    16.2
 1992-01-03 15:00:00    13.2
 1992-01-03 18:00:00    13.4
 1992-01-03 21:00:00    13.8
 1992-01-04 00:00:00    14.8
 1992-01-04 03:00:00    13.8
 1992-01-04 06:00:00    7.6
 1992-01-04 09:00:00    5.8
 1992-01-04 12:00:00    4.4
 1992-01-04 15:00:00    5.6
 1992-01-04 18:00:00    6.0
 1992-01-04 21:00:00    7.0
 1992-01-05 00:00:00    6.8
 1992-01-05 03:00:00    3.4
 1992-01-05 06:00:00    5.8
 1992-01-05 09:00:00    10.6
 1992-01-05 12:00:00    9.2
 1992-01-05 15:00:00    10.6
 1992-01-05 18:00:00    9.8
 1992-01-05 21:00:00    11.2
 1992-01-06 00:00:00    12.0
 1992-01-06 03:00:00    10.2
 1992-01-06 06:00:00    9.0
 1992-01-06 09:00:00    9.0
 1992-01-06 12:00:00    8.6
 1992-01-06 15:00:00    8.4
 1992-01-06 18:00:00    8.2
 1992-01-06 21:00:00    8.8
 1992-01-07 00:00:00    10.0
 1992-01-07 03:00:00    9.6
 1992-01-07 06:00:00    8.0
 1992-01-07 09:00:00    9.6
 1992-01-07 12:00:00    10.8
 1992-01-07 15:00:00    10.2
 1992-01-07 18:00:00    9.8
 1992-01-07 21:00:00    10.2
 1992-01-08 00:00:00    9.4
 1992-01-08 03:00:00    11.4
 1992-01-08 06:00:00    12.6
 1992-01-08 09:00:00    12.8
 1992-01-08 12:00:00    10.4
 1992-01-08 15:00:00    11.2
 1992-01-08 18:00:00    9.0
 1992-01-08 21:00:00    10.2
 1992-01-09 00:00:00    8.2

以上数据集以数据帧格式(总大小= 70K)具有20年的“ X”值,平均周期为3小时 Original data fig 1。由于数据集庞大且复杂,因此需要进行数据准备,其中使用完整数据的monthly_mean_data 2每月平均值,

df_monthly = dataset.resample('M', on='DATE').mean()  # dataset contains DATE and x values
indexedDataset=monthly.copy()

test_stationarity(indexedDataset)   # using test_stationarity function created by me that includes adfuller function and rolloing mean analysis

## Estimating trend
indexedDataset_logScale=np.log(indexedDataset)    # taken log in index datasets

# taking the difference of moving an average and actual number of 'X', taking the log

movingAverage = indexedDataset_logScale.rolling(window=12).mean()    # 12 for monthly
movingSTD = indexedDataset_logScale.rolling(window=12).std()

#Differencing
datasetLogScaleMinussMovingAverage=indexedDataset_logScale-movingAverage
# removing NAN values
datasetLogScaleMinussMovingAverage.dropna(inplace=True)
datasetLogScaleMinussMovingAverage.head(12)

test_stationarity(datasetLogScaleMinussMovingAverage) 

在运行test_stationarity函数时完成所有这些操作后,我得到了this 3,这表明我的滚动平均值和std不是恒定的,因此数据仍然保持不变。因此,编写以下代码以使数据固定

    exponentialDecayWeightAverage=indexedDataset_logScale.ewm(halflife=365,min_periods=0,adjust=True).mean()
datasetLogScaleMinussMovingExponentialDecayAverage = indexedDataset_logScale-exponentialDecayWeightAverage
datasetLogScaleMinussMovingExponentialDecayAverage.dropna()   

# shifting the value into time series so that we can used it for forecasting

datasetLogDiffShifting=indexedDataset_logScale-indexedDataset_logScale.shift()   # d=2

datasetLogDiffShifting.dropna(inplace=True)

test_stationarity(datasetLogDiffShifting)         

而且,这产生了fig 4。这再次表明滚动平均值和std不是恒定的,因此不是平稳的。有人可以帮我吗,1)天气取每月平均值而不是所有数据都合适吗? 2)如何使我的数据保持静止

0 个答案:

没有答案