如何通过向量化获得多元回归的公式

时间:2018-07-18 17:15:49

标签: r vectorization lm

假设我有以下代码,可以进行多次回归并将lm和带有逐步选择模型的lm存储在小标题中:

library(dplyr)
library(tibble)
library(MASS)
set.seed(1)

df <- data.frame(A = sample(3, 10, replace = T), 
                 B = sample(100, 10, replace = T), 
                 C = sample(100, 10, replace = T))
df <- df %>% arrange(A)

formula_df <- as.tibble(NA)
aic_df <- as.tibble(NA)

for (i in unique(df$A)){
    temp <- df %>% filter(A == i)

    formula_df[i, 1] <- temp %>% 
        do(model = lm(B ~ C, data = .))

    aic_df[i, 1] <- temp %>%
        do(model = stepAIC(formula_df[[1,1]], direction = "both", trace = F))
}

是否可以进行矢量化以使其更快,例如使用* pply函数?当数据变大时,循环将变得极其缓慢。预先谢谢你。

1 个答案:

答案 0 :(得分:1)

您可以尝试以下方法:

model <- df %>% group_by(A) %>% 
    summarise(formula_model = list(lm(B ~ C))) %>% 
    mutate(aic_model = list(stepAIC(.[[1,2]], direction = "both", trace = F)))