如何生成“较低”和“较高”的预测,而不仅仅是“ yhat”?
import statsmodels
from statsmodels.tsa.arima.model import ARIMA
assert statsmodels.__version__ == '0.12.0'
arima = ARIMA(df['value'], order=order)
model = arima.fit()
现在我可以生成“ yhat”预测
yhat = model.forecast(123)
并获得模型参数的置信区间(但不包含预测):
model.conf_int()
但是如何生成yhat_lower
和yhat_upper
预测?
答案 0 :(得分:0)
通常,forecast
和predict
方法仅产生点预测,而get_forecast
和get_prediction
方法产生包括预测间隔的完整结果。
在您的示例中,您可以执行以下操作:
forecast = model.get_forecast(123)
yhat = forecast.predicted_mean
yhat_conf_int = forecast.conf_int(alpha=0.05)
如果您的数据是熊猫系列,那么yhat_conf_int
将是一个包含两列lower <name>
和upper <name>
的DataFrame,其中<name>
是熊猫系列的名称
如果您的数据是一个numpy数组(或Python列表),则yhat_conf_int
将是一个(n_forecasts, 2)
数组,其中第一列是间隔的下部,第二列是间隔的上部部分。
答案 1 :(得分:0)
要生成预测区间而不是置信区间(您已经巧妙地区分了它们,并且在 difference between prediction intervals and confidence intervals 上的 Hyndman 博客文章中也有介绍),那么您可以遵循此 {{ 3}}。
您还可以尝试计算自举预测区间,这在此 answer 中列出。
下面是我尝试实现这个的尝试(当我有机会更详细地检查它时我会更新它):
def bootstrap_prediction_interval(y_train: Union[list, pd.Series],
y_fit: Union[list, pd.Series],
y_pred_value: float,
alpha: float = 0.05,
nbootstrap: int = None,
seed: int = None):
"""
Bootstraps a prediction interval around an ARIMA model's predictions.
Method presented clearly here:
- https://stats.stackexchange.com/a/254321
Also found through here, though less clearly:
- https://otexts.com/fpp3/prediction-intervals.html
Can consider this to be a time-series version of the following generalisation:
- https://saattrupdan.github.io/2020-03-01-bootstrap-prediction/
:param y_train: List or Series of training univariate time-series data.
:param y_fit: List or Series of model fitted univariate time-series data.
:param y_pred_value: Float of the model predicted univariate time-series you want to compute P.I. for.
:param alpha: float = 0.05, the prediction uncertainty.
:param nbootstrap: integer = 1000, the number of bootstrap sampling of the residual forecast error.
Rules of thumb provided here:
- https://stats.stackexchange.com/questions/86040/rule-of-thumb-for-number-of-bootstrap-samples
:param seed: Integer to specify if you want deterministic sampling.
:return: A list [`lower`, `pred`, `upper`] with `pred` being the prediction
of the model and `lower` and `upper` constituting the lower- and upper
bounds for the prediction interval around `pred`, respectively.
"""
# get number of samples
n = len(y_train)
# compute the forecast errors/resid
fe = y_train - y_fit
# get percentile bounds
percentile_lower = (alpha * 100) / 2
percentile_higher = 100 - percentile_lower
if nbootstrap is None:
nbootstrap = np.sqrt(n).astype(int)
if seed is None:
rng = np.random.default_rng()
else:
rng = np.random.default_rng(seed)
# bootstrap sample from errors
error_bootstrap = []
for _ in range(nbootstrap):
idx = rng.integers(low=n)
error_bootstrap.append(fe[idx])
# get lower and higher percentiles of sampled forecast errors
fe_lower = np.percentile(a=error_bootstrap, q=percentile_lower)
fe_higher = np.percentile(a=error_bootstrap, q=percentile_higher)
# compute P.I.
pi = [y_pred_value + fe_lower, y_pred_value, y_pred_value + fe_higher]
return pi