我想创建一个可以处理任何数据框的函数,它具有最小列数(1)和最大列数(n)。该函数必须对每个自变量进行简单的线性回归。我知道我必须使用循环(。),但我不知道如何使用它。 我试试这个,但它不起作用:
>data1<-read.csv(file.choose(),header=TRUE,sep=",")
>n<-nrow(data1)
>PredictorVariables <- paste("x", 1:n, sep="")
>Formula <-paste("y ~ ", PredictorVariables, collapse=" + ",data=data1)
>lm(Formula, data=data1)
答案 0 :(得分:0)
以下是lapply()
使用mtcars
数据集的方法。我们将选择mpg
作为因变量,从数据集中提取剩余列,然后使用lapply()
在indepVars
向量中的每个元素上运行回归模型。每个模型的输出都保存到列表中,包括自变量的名称以及生成的模型对象。
indepVars <- names(mtcars)[!(names(mtcars) %in% "mpg")]
modelList <- lapply(indepVars,function(x){
message("x is: ",x)
result <- lm(mpg ~ mtcars[[x]],data=mtcars)
list(variable=x,model=result)
})
# print the first model
modelList[[1]]$variable
summary(modelList[[1]]$model)
然后可以使用提取运算符[[
来打印任何模型的内容。
...和输出:
> # print the first model
> modelList[[1]]$variable
[1] "cyl"
> summary(modelList[[1]]$model)
Call:
lm(formula = mpg ~ mtcars[[x]], data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-4.9814 -2.1185 0.2217 1.0717 7.5186
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.8846 2.0738 18.27 < 2e-16 ***
mtcars[[x]] -2.8758 0.3224 -8.92 6.11e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.206 on 30 degrees of freedom
Multiple R-squared: 0.7262, Adjusted R-squared: 0.7171
F-statistic: 79.56 on 1 and 30 DF, p-value: 6.113e-10
>
回应原始海报的评论,这里是将上述过程封装在R函数中所需的代码。函数regList()
获取数据框名称和因变量字符串,然后继续对传递给函数的数据框中的每个剩余变量运行因变量的回归。
regList <- function(dataframe,depVar) {
indepVars <- names(dataframe)[!(names(dataframe) %in% depVar)]
modelList <- lapply(indepVars,function(x){
message("x is: ",x)
result <- lm(dataframe[[depVar]] ~ dataframe[[x]],data=dataframe)
list(variable=x,model=result)
})
modelList
}
modelList <- regList(mtcars,"mpg")
# print the first model
modelList[[1]]$variable
summary(modelList[[1]]$model)
可以从各个模型对象中提取各种内容。输出如下:
> modelList <- regList(mtcars,"mpg")
x is: cyl
x is: disp
x is: hp
x is: drat
x is: wt
x is: qsec
x is: vs
x is: am
x is: gear
x is: carb
> # print the first model
> modelList[[1]]$variable
[1] "cyl"
> summary(modelList[[1]]$model)
Call:
lm(formula = dataframe[[depVar]] ~ dataframe[[x]], data = dataframe)
Residuals:
Min 1Q Median 3Q Max
-4.9814 -2.1185 0.2217 1.0717 7.5186
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.8846 2.0738 18.27 < 2e-16 ***
dataframe[[x]] -2.8758 0.3224 -8.92 6.11e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.206 on 30 degrees of freedom
Multiple R-squared: 0.7262, Adjusted R-squared: 0.7171
F-statistic: 79.56 on 1 and 30 DF, p-value: 6.113e-10
>
答案 1 :(得分:0)
以下内容如何:
首先,我创建一些示例数据:
# Sample data
set.seed(2017);
x <- sapply(1:10, function(x) x * seq(1:100) + rnorm(100));
df <- data.frame(Y = rowSums(x), x);
接下来我定义一个自定义函数:
# Custom function where
# df is the source dataframe
# idx.y is the column index of the response variable in df
# idx.x.min is the column index of the first explanatory variable
# idx.x.max is the column index of the last explanatory variable
# The function returns a list of lm objects
myfit <- function(df, idx.y, idx.x.min, idx.x.max) {
stopifnot(idx.x.min < idx.x.max, idx.x.max <= ncol(df));
res <- list();
for (i in idx.x.min:idx.x.max) {
res[[length(res) + 1]] <- lm(df[, idx.y] ~ df[, i]);
}
return(res);
}
然后我使用示例数据运行myfit
。
lst <- myfit(df, 1, 2, 11);
返回对象lst
是类list
的{{1}}个11-2+1 = 10
拟合结果的lm
。例如,
lst[[1]];
#
#Call:
#lm(formula = df[, idx.y] ~ df[, i])
#
#Coefficients:
#(Intercept) df[, i]
# -5.121 55.100
对于以后的帖子,我建议您查看how to ask good questions here on SO,并提供minimal reproducible example/attempt,包含示例数据。