Statsmodels ARIMA:如何获得置信度/预测间隔?

时间:2020-10-09 10:05:34

标签: python statsmodels arima

如何生成“较低”和“较高”的预测,而不仅仅是“ yhat”?

import statsmodels
from statsmodels.tsa.arima.model import ARIMA

assert statsmodels.__version__ == '0.12.0'

arima = ARIMA(df['value'], order=order)
model = arima.fit()

现在我可以生成“ yhat”预测

yhat = model.forecast(123)

并获得模型参数的置信区间(但不包含预测):

model.conf_int()

但是如何生成yhat_loweryhat_upper预测?

2 个答案:

答案 0 :(得分:0)

通常,forecastpredict方法仅产生点预测,而get_forecastget_prediction方法产生包括预测间隔的完整结果。

在您的示例中,您可以执行以下操作:

forecast = model.get_forecast(123)
yhat = forecast.predicted_mean
yhat_conf_int = forecast.conf_int(alpha=0.05)

如果您的数据是熊猫系列,那么yhat_conf_int将是一个包含两列lower <name>upper <name>的DataFrame,其中<name>是熊猫系列的名称

如果您的数据是一个numpy数组(或Python列表),则yhat_conf_int将是一个(n_forecasts, 2)数组,其中第一列是间隔的下部,第二列是间隔的上部部分。

答案 1 :(得分:0)

要生成预测区间而不是置信区间(您已经巧妙地区分了它们,并且在 difference between prediction intervals and confidence intervals 上的 Hyndman 博客文章中也有介绍),那么您可以遵循此 {{ 3}}。

您还可以尝试计算自举预测区间,这在此 answer 中列出。

下面是我尝试实现这个的尝试(当我有机会更详细地检查它时我会更新它):

def bootstrap_prediction_interval(y_train: Union[list, pd.Series],
                                  y_fit: Union[list, pd.Series],
                                  y_pred_value: float,
                                  alpha: float = 0.05,
                                  nbootstrap: int = None,
                                  seed: int = None):
    """
    Bootstraps a prediction interval around an ARIMA model's predictions.
    Method presented clearly here:
        - https://stats.stackexchange.com/a/254321
    Also found through here, though less clearly:
        - https://otexts.com/fpp3/prediction-intervals.html
    Can consider this to be a time-series version of the following generalisation:
        - https://saattrupdan.github.io/2020-03-01-bootstrap-prediction/

    :param y_train: List or Series of training univariate time-series data.
    :param y_fit: List or Series of model fitted univariate time-series data.
    :param y_pred_value: Float of the model predicted univariate time-series you want to compute P.I. for.
    :param alpha: float = 0.05, the prediction uncertainty.
    :param nbootstrap: integer = 1000, the number of bootstrap sampling of the residual forecast error.
        Rules of thumb provided here:
            - https://stats.stackexchange.com/questions/86040/rule-of-thumb-for-number-of-bootstrap-samples
    :param seed: Integer to specify if you want deterministic sampling.

    :return: A list [`lower`, `pred`, `upper`] with `pred` being the prediction
    of the model and `lower` and `upper` constituting the lower- and upper
    bounds for the prediction interval around `pred`, respectively.
    """

    # get number of samples
    n = len(y_train)

    # compute the forecast errors/resid
    fe = y_train - y_fit

    # get percentile bounds
    percentile_lower = (alpha * 100) / 2
    percentile_higher = 100 - percentile_lower

    if nbootstrap is None:
        nbootstrap = np.sqrt(n).astype(int)
    if seed is None:
        rng = np.random.default_rng()
    else:
        rng = np.random.default_rng(seed)

    # bootstrap sample from errors
    error_bootstrap = []
    for _ in range(nbootstrap):
        idx = rng.integers(low=n)
        error_bootstrap.append(fe[idx])

    # get lower and higher percentiles of sampled forecast errors
    fe_lower = np.percentile(a=error_bootstrap, q=percentile_lower)
    fe_higher = np.percentile(a=error_bootstrap, q=percentile_higher)

    # compute P.I.
    pi = [y_pred_value + fe_lower, y_pred_value, y_pred_value + fe_higher]

    return pi