Question

我有一个时间序列模型名称矢量如下，将矢量名称视为模型：

  [1] "ARIMA(2,1,0) with drift" "ARIMA(2,0,0) with non-zero mean" "ARIMA(2,0,0) with non-zero mean" "ARIMA(2,0,0) with non-zero mean" "ARIMA(0,0,1) with non-zero mean"

这些载体包含五个不同的部分：

1）模型名称：在括号前总是有一个模型名称，在这种情况下＆＃34; ARIMA＆＃34;是一个模型名称（ARIMA是一种预测技术，它完全基于其惯性，自回归整合移动平均的简写来预测系列的未来值）

2）自动回归部分（AR部分称为＆＃34; p＆＃34;）：逗号前面的括号后面的第一个数字是自回归部分，例如这些向量如上所示，AR部分的值为2,2,2,2,0。

3）移动平均线部分（简称＆＃34; d＆＃34;）：第一个逗号后面的括号中的第二个元素称为移动平均线部分。在这个例子中，我有1,0,0,0,0作为移动平均线

4）差异部分（简称＆＃34; q＆＃34;）：括号中的最后一个元素是差异部分，主要称为＆＃34; q＆＃34;在术语中。在这个例子中，我有0,0,0,0,1作为值。

5）＆＃34;＆＃34;之后的另外两个部分漂移和非零部分。

问题是我需要从模型向量中提取这些元素。

通过查看模型名称，我想编写一个程序来提取以下内容：

 1. Name of the model eg: ARIMA 
 2. Number of AR coefficients 
 3. Number of MA coefficients 
 4. Order of differencing 
 5. Whether the model has a drift or not 
 6. whether it has a zero mean or not

我的输出应如下所示：

   Model p d q outcome_with_drift outcome_with_non_zero_mean
 1 ARIMA 2 1 0                  1                          0
 2 ARIMA 2 0 0                  0                          1
 3 ARIMA 2 0 0                  0                          1
 4 ARIMA 2 0 0                  0                          1
 5 ARIMA 0 0 1                  0                          1

Answer 1

您可以使用library(stringr)将矢量提取到不同的列中，例如，如果vect是具有以下输入的矢量：

vect <- c("ARIMA(2,1,0) with drift", "ARIMA(2,0,0) with non-zero mean" ,"ARIMA(2,0,0) with non-zero mean" ,
          "ARIMA(2,0,0) with non-zero mean" ,"ARIMA(0,0,1) with non-zero mean")

然后使用str_split_fixed将其提取到单独的列中，如下所示：

library(stringr)

df <- data.frame(str_split_fixed(vect,"\\s|\\(|\\)|,",n=5))
###Here we have choosen the separator as space(\\s), parenthesis ( \\( and \\) ) and commas (,)

names(df) <- c("Model","p","d","q","outcome")
#Rename basis the question, into follwing:
#p is the number of autoregressive terms(AR)
#d is the number of nonseasonal differences needed for stationarity(MA)
#q is the number of lagged forecast errors in the prediction equation(order of differencing)

df$outcome_ <- gsub("\\s|-","_",trimws(df$outcome))
#cleaning the outcome column by replacing spaces and dashes with underscores
dummy_mat <- data.frame(model.matrix(~outcome_-1,data=df))
#using model.matrix to calculate the dummies for drift and non zero mean, for the value of 1 meaning True and 0 meaning False
df_final <- data.frame(df[,1:4],dummy_mat)

<强>结果：

#   Model p d q outcome_with_drift outcome_with_non_zero_mean
# 1 ARIMA 2 1 0                  1                          0
# 2 ARIMA 2 0 0                  0                          1
# 3 ARIMA 2 0 0                  0                          1
# 4 ARIMA 2 0 0                  0                          1
# 5 ARIMA 0 0 1                  0                          1

如何从模型名称中提取模型信息？

1 个答案: