我有一个数字框架,其中包含数字名称为A,B,C,D的数字值。我试图使用变量生成线性回归模型并尝试所有可能的组合,如A,A + B,A + C,B,B + C ......
我在使用数据框生成组合时遇到问题。
Data frame
DependentVar A B C D
我试图产生这样的东西:
自变量的组合如:
var <- A,B,C,D,A+B,A+C,A+D,B+C,B+D,C+D,A+B+C,A+B+D and so on..
for (v in var){
models <- lm (DependentVar ~ eval(parse(text=v)), data=data)
r2 <- append(summary(models)$r.squared)
}
像dataframe一样输出:
Variable combination Model R2
A 0.8
B 0.7
.
.
等等
任何帮助将不胜感激!
答案 0 :(得分:1)
你有正确的想法,但你可以通过1)使用lapply()和2)使用as.formula()
来改善结果set.seed(1)
d<-data.frame(DV=rnorm(100,mean=100,sd=10),A=rnorm(100,mean=100,sd=10),B=rnorm(100,mean=100,sd=10))
formula_list<-list(as.formula('DV ~ A'),
as.formula('DV ~ B'),
as.formula('DV ~ A + B'))
lapply(formula_list, FUN = lm, data=d)
要获取输出数据框,可以使用相同的机制,但不是FUN = lm,而是将FUN =设置为lm的包装器,用于进行回归后处理。
lm_wrapper<-function(formula, data){
reg_res<-lm(formula, data=data)
rsq<-summary(reg_res)$r.squared
return(data.frame(formula=as.character(formula)[3], rsq=rsq))
}
all_res<-lapply(formula_list, FUN = lm_wrapper, data=d)
all_res_stack<-do.call('rbind',all_res)
以下是all_res_stack的样子:
> all_res_stack
formula rsq
1 A 0.004809535
2 B 0.026144428
3 A + B 0.026821577
答案 1 :(得分:0)
set.seed(123)
mydata <- data.frame(A = rnorm(10, mean = 5),
B = rnorm(10, mean = 10),
C = rnorm(10, sd = 2),
D = rnorm(10, sd = 5))
mydata$DependentVar <- with(mydata, A + B + C + D + rnorm(10))
# expand.grid makes a data.frame, where each possible combination of values is
# given a row. Here, each row states which variables to use in a model. Remove
# the row where no variables are used.
independent_vars <- c('A', 'B', 'C', 'D')
include_choices <- lapply(independent_vars, function(x) c(TRUE, FALSE))
names(include_choices) <- independent_vars
combos <- do.call('expand.grid', args = include_choices)
combos <- combos[apply(combos, 1, any), ]
# Use combos to construct each model
predict_some_cols <- function(which_cols) {
model_vars <- c('DependentVar', colnames(combos)[which_cols])
lm(DependentVar ~ ., data = mydata[, model_vars])
}
model_list <- apply(combos, 1L, predict_some_cols)
# A really weird-looking way makes names, please somebody improve this
names(model_list) <- apply(combos, 1,
FUN = function(which_cols) {
paste0(colnames(combos)[which_cols],
collapse = ' + ')
})
# Now go through the models and get the desired data.
rsquared <- vapply(model_list,
function(model) summary(model)$r.squared,
numeric(1))