如何从模型名称中提取模型信息?

时间:2017-04-25 07:39:33

标签: r

我有一个时间序列模型名称矢量如下,将矢量名称视为模型

  [1] "ARIMA(2,1,0) with drift" "ARIMA(2,0,0) with non-zero mean" "ARIMA(2,0,0) with non-zero mean" "ARIMA(2,0,0) with non-zero mean" "ARIMA(0,0,1) with non-zero mean"

这些载体包含五个不同的部分:

1)模型名称:在括号前总是有一个模型名称,在这种情况下" ARIMA"是一个模型名称(ARIMA是一种预测技术,它完全基于其惯性,自回归整合移动平均的简写来预测系列的未来值)

2)自动回归部分(AR部分称为" p"):逗号前面的括号后面的第一个数字是自回归部分,例如这些向量如上所示,AR部分的值为2,2,2,2,0。

3)移动平均线部分(简称" d"):第一个逗号后面的括号中的第二个元素称为移动平均线部分。 在这个例子中,我有1,0,0,0,0作为移动平均线

4)差异部分(简称" q"):括号中的最后一个元素是差异部分,主要称为" q"在术语中。 在这个例子中,我有0,0,0,0,1作为值。

5)""之后的另外两个部分漂移和非零部分。

问题是我需要从模型向量中提取这些元素。

通过查看模型名称,我想编写一个程序来提取以下内容:

 1. Name of the model eg: ARIMA 
 2. Number of AR coefficients 
 3. Number of MA coefficients 
 4. Order of differencing 
 5. Whether the model has a drift or not 
 6. whether it has a zero mean or not 

我的输出应如下所示:

   Model p d q outcome_with_drift outcome_with_non_zero_mean
 1 ARIMA 2 1 0                  1                          0
 2 ARIMA 2 0 0                  0                          1
 3 ARIMA 2 0 0                  0                          1
 4 ARIMA 2 0 0                  0                          1
 5 ARIMA 0 0 1                  0                          1

1 个答案:

答案 0 :(得分:2)

您可以使用library(stringr)将矢量提取到不同的列中,例如,如果vect是具有以下输入的矢量:

vect <- c("ARIMA(2,1,0) with drift", "ARIMA(2,0,0) with non-zero mean" ,"ARIMA(2,0,0) with non-zero mean" ,
          "ARIMA(2,0,0) with non-zero mean" ,"ARIMA(0,0,1) with non-zero mean")

然后使用str_split_fixed将其提取到单独的列中,如下所示:

library(stringr)

df <- data.frame(str_split_fixed(vect,"\\s|\\(|\\)|,",n=5))
###Here we have choosen the separator as space(\\s), parenthesis ( \\( and \\) ) and commas (,)

names(df) <- c("Model","p","d","q","outcome")
#Rename basis the question, into follwing:
#p is the number of autoregressive terms(AR)
#d is the number of nonseasonal differences needed for stationarity(MA)
#q is the number of lagged forecast errors in the prediction equation(order of differencing)

df$outcome_ <- gsub("\\s|-","_",trimws(df$outcome))
#cleaning the outcome column by replacing spaces and dashes with underscores
dummy_mat <- data.frame(model.matrix(~outcome_-1,data=df))
#using model.matrix to calculate the dummies for drift and non zero mean, for the value of 1 meaning True and 0 meaning False
df_final <- data.frame(df[,1:4],dummy_mat)

<强>结果

#   Model p d q outcome_with_drift outcome_with_non_zero_mean
# 1 ARIMA 2 1 0                  1                          0
# 2 ARIMA 2 0 0                  0                          1
# 3 ARIMA 2 0 0                  0                          1
# 4 ARIMA 2 0 0                  0                          1
# 5 ARIMA 0 0 1                  0                          1