我正在使用ARIMA模型来预测未来的时间序列值。在此之前,我需要使数据季节性自由,平稳和下降。我阅读了许多有关制作数据文具的文章。而且,到目前为止,我已经编写了以下代码,但仍然无法消除季节性和平稳性。数据示例如下,
DATE X
1992-01-01 03:00:00 10.2
1992-01-01 06:00:00 10.4
1992-01-01 09:00:00 11.8
1992-01-01 12:00:00 12.0
1992-01-01 15:00:00 10.4
1992-01-01 18:00:00 9.4
1992-01-01 21:00:00 10.4
1992-01-02 00:00:00 13.6
1992-01-02 03:00:00 13.2
1992-01-02 06:00:00 11.8
1992-01-02 09:00:00 12.0
1992-01-02 12:00:00 12.8
1992-01-02 15:00:00 12.6
1992-01-02 18:00:00 11.0
1992-01-02 21:00:00 12.2
1992-01-03 00:00:00 13.8
1992-01-03 03:00:00 14.0
1992-01-03 06:00:00 13.4
1992-01-03 09:00:00 14.2
1992-01-03 12:00:00 16.2
1992-01-03 15:00:00 13.2
1992-01-03 18:00:00 13.4
1992-01-03 21:00:00 13.8
1992-01-04 00:00:00 14.8
1992-01-04 03:00:00 13.8
1992-01-04 06:00:00 7.6
1992-01-04 09:00:00 5.8
1992-01-04 12:00:00 4.4
1992-01-04 15:00:00 5.6
1992-01-04 18:00:00 6.0
1992-01-04 21:00:00 7.0
1992-01-05 00:00:00 6.8
1992-01-05 03:00:00 3.4
1992-01-05 06:00:00 5.8
1992-01-05 09:00:00 10.6
1992-01-05 12:00:00 9.2
1992-01-05 15:00:00 10.6
1992-01-05 18:00:00 9.8
1992-01-05 21:00:00 11.2
1992-01-06 00:00:00 12.0
1992-01-06 03:00:00 10.2
1992-01-06 06:00:00 9.0
1992-01-06 09:00:00 9.0
1992-01-06 12:00:00 8.6
1992-01-06 15:00:00 8.4
1992-01-06 18:00:00 8.2
1992-01-06 21:00:00 8.8
1992-01-07 00:00:00 10.0
1992-01-07 03:00:00 9.6
1992-01-07 06:00:00 8.0
1992-01-07 09:00:00 9.6
1992-01-07 12:00:00 10.8
1992-01-07 15:00:00 10.2
1992-01-07 18:00:00 9.8
1992-01-07 21:00:00 10.2
1992-01-08 00:00:00 9.4
1992-01-08 03:00:00 11.4
1992-01-08 06:00:00 12.6
1992-01-08 09:00:00 12.8
1992-01-08 12:00:00 10.4
1992-01-08 15:00:00 11.2
1992-01-08 18:00:00 9.0
1992-01-08 21:00:00 10.2
1992-01-09 00:00:00 8.2
以上数据集以数据帧格式(总大小= 70K)具有20年的“ X”值,平均周期为3小时 Original data fig 1。由于数据集庞大且复杂,因此需要进行数据准备,其中使用完整数据的monthly_mean_data 2每月平均值,
df_monthly = dataset.resample('M', on='DATE').mean() # dataset contains DATE and x values
indexedDataset=monthly.copy()
test_stationarity(indexedDataset) # using test_stationarity function created by me that includes adfuller function and rolloing mean analysis
## Estimating trend
indexedDataset_logScale=np.log(indexedDataset) # taken log in index datasets
# taking the difference of moving an average and actual number of 'X', taking the log
movingAverage = indexedDataset_logScale.rolling(window=12).mean() # 12 for monthly
movingSTD = indexedDataset_logScale.rolling(window=12).std()
#Differencing
datasetLogScaleMinussMovingAverage=indexedDataset_logScale-movingAverage
# removing NAN values
datasetLogScaleMinussMovingAverage.dropna(inplace=True)
datasetLogScaleMinussMovingAverage.head(12)
test_stationarity(datasetLogScaleMinussMovingAverage)
在运行test_stationarity
函数时完成所有这些操作后,我得到了this 3,这表明我的滚动平均值和std不是恒定的,因此数据仍然保持不变。因此,编写以下代码以使数据固定
exponentialDecayWeightAverage=indexedDataset_logScale.ewm(halflife=365,min_periods=0,adjust=True).mean()
datasetLogScaleMinussMovingExponentialDecayAverage = indexedDataset_logScale-exponentialDecayWeightAverage
datasetLogScaleMinussMovingExponentialDecayAverage.dropna()
# shifting the value into time series so that we can used it for forecasting
datasetLogDiffShifting=indexedDataset_logScale-indexedDataset_logScale.shift() # d=2
datasetLogDiffShifting.dropna(inplace=True)
test_stationarity(datasetLogDiffShifting)
而且,这产生了fig 4。这再次表明滚动平均值和std不是恒定的,因此不是平稳的。有人可以帮我吗,1)天气取每月平均值而不是所有数据都合适吗? 2)如何使我的数据保持静止