我正在尝试适应数百万个模型,然后在样本预测之外测量它们的性能(选择最低的RMSE)。
我最初在lm()的几个模型上使用了lapply(),然后会预测来自predict.lm()的样本观察结果,但它太慢了。
我在其他线程上遇到lm.fit()更快,而且确实如此。但我现在不确定使用系数进行预测的最佳方法。
我也尝试过speedglm软件包,但它并没有为我工作。
请参阅下面的代码。
#findcombo gets every possible combination of variables and places them in a
list of character vectors
findcombo <- function(x){
do.call("c", lapply(seq_along(x), function(i) combn(x, i, FUN = list)))
}
data %>% select(2:ncol(data)) %>% colnames() %>% findcombo()
modelframe2 <- tibble(combos) #puts list as a column in a tibble
#dependent variable is labeled as dependent, function below creates dataframes with omitted NAs to be used for lm.fit
modelframe2 <- modelframe2 %>%
mutate(indep = map(combos, function(x){data %>% select(dependent, one_of(x)) %>%
na.omit()} ))
#reg performs lm.fit on the matrices produced by the function above and generates a table of coefficients
reg <- function(mat){
return((lm.fit(cbind(1,as.matrix(mat[,2:ncol(mat)])), mat[,1]))$coefficients)
}
#applies reg across each possible combination of variables
modelframe2 <- modelframe2 %>% mutate(models = map(indep, reg))