回归r中数据框列的组合

时间:2016-06-23 19:09:15

标签: r


我有一个数字框架,其中包含数字名称为A,B,C,D的数字值。我试图使用变量生成线性回归模型并尝试所有可能的组合,如A,A + B,A + C,B,B + C ...... 我在使用数据框生成组合时遇到问题。

Data frame
DependentVar A B C D 

我试图产生这样的东西:
自变量的组合如:

var <- A,B,C,D,A+B,A+C,A+D,B+C,B+D,C+D,A+B+C,A+B+D and so on..
for (v in var){
models <- lm (DependentVar ~ eval(parse(text=v)), data=data)
r2 <- append(summary(models)$r.squared)
}

像dataframe一样输出:

Variable combination  Model R2    
A                      0.8
B                      0.7
.
.

等等
任何帮助将不胜感激!

2 个答案:

答案 0 :(得分:1)

你有正确的想法,但你可以通过1)使用lapply()和2)使用as.formula()

来改善结果
set.seed(1)
d<-data.frame(DV=rnorm(100,mean=100,sd=10),A=rnorm(100,mean=100,sd=10),B=rnorm(100,mean=100,sd=10))

formula_list<-list(as.formula('DV ~ A'),
                   as.formula('DV ~ B'),
                   as.formula('DV ~ A + B'))

lapply(formula_list, FUN = lm, data=d)

要获取输出数据框,可以使用相同的机制,但不是FUN = lm,而是将FUN =设置为lm的包装器,用于进行回归后处理。

lm_wrapper<-function(formula, data){
  reg_res<-lm(formula, data=data)
  rsq<-summary(reg_res)$r.squared
  return(data.frame(formula=as.character(formula)[3], rsq=rsq))
}

all_res<-lapply(formula_list, FUN = lm_wrapper, data=d)

all_res_stack<-do.call('rbind',all_res)

以下是all_res_stack的样子:

> all_res_stack
  formula         rsq
1       A 0.004809535
2       B 0.026144428
3   A + B 0.026821577

答案 1 :(得分:0)

set.seed(123)

mydata <- data.frame(A = rnorm(10, mean = 5),
                     B = rnorm(10, mean = 10),
                     C = rnorm(10, sd = 2),
                     D = rnorm(10, sd = 5))
mydata$DependentVar <- with(mydata, A + B + C + D + rnorm(10))

# expand.grid makes a data.frame, where each possible combination of values is
# given a row. Here, each row states which variables to use in a model. Remove
# the row where no variables are used.
independent_vars <- c('A', 'B', 'C', 'D')
include_choices <- lapply(independent_vars, function(x) c(TRUE, FALSE))
names(include_choices) <- independent_vars

combos <- do.call('expand.grid', args = include_choices)

combos <- combos[apply(combos, 1, any), ]

# Use combos to construct each model
predict_some_cols <- function(which_cols) {
  model_vars <- c('DependentVar', colnames(combos)[which_cols])
  lm(DependentVar ~ ., data = mydata[, model_vars])
}

model_list <- apply(combos, 1L, predict_some_cols)

# A really weird-looking way makes names, please somebody improve this
names(model_list) <- apply(combos, 1,
                           FUN = function(which_cols) {
                             paste0(colnames(combos)[which_cols],
                                    collapse = ' + ')
                           })

# Now go through the models and get the desired data.
rsquared <- vapply(model_list,
                   function(model) summary(model)$r.squared,
                   numeric(1))